This README explains what each exercise question in DV0101EN-Exercise-Introduction-to-Matplotlib-and-Line-Plots--v2.ipynb does and why it is asked. It also includes a quick note about a common error and how to fix it.
- Use a Python environment with Jupyter Notebook or Jupyter Lab.
- Minimal required packages: pandas, numpy, matplotlib.
- Example (PowerShell):
pip install pandas numpy matplotlib
jupyter notebookOpen DV0101EN-Exercise-Introduction-to-Matplotlib-and-Line-Plots--v2.ipynb and run the cells in order.
The notebook teaches basic data wrangling with pandas and plotting with matplotlib (via pandas' plotting API). It uses a cleaned Canada immigration CSV and walks through:
- Loading the dataset and setting
Countryas the index - Creating series/dataframes for specific countries or sets of countries
- Plotting line charts to show trends over time
Below are the exercise questions and an explanation of each.
- Plot a line graph of immigration from Haiti using
df.plot()
- What it does: selects the Haiti row from
df_canas a pandas Series (years are the index) and calls.plot(). - Why this is asked: demonstrates plotting a single time series. It shows that when you plot a Series, pandas uses the Series' index (years) on the x-axis automatically.
- Key point: because the result is a Series, you do not need to transpose; you can convert the index to integers for nicer x-axis labels (index.map(int)).
- Annotate the 2010 earthquake on the Haiti plot
- What it does: uses
plt.text()to add a label at a chosen (x, y) data coordinate. - Why: teaches basic annotation and the importance of index data types — if years are integers you can annotate by year value; if they are strings you'd annotate by positional index.
- Compare the number of immigrants from India and China (two-part exercise)
-
Step 1: Get the data set for China and India and display the dataframe:
- What it does:
df_CI = df_can.loc[['India', 'China'], years]selects the two countries (rows) and the columns for the years (1980–2013). This produces a DataFrame with countries as the index and year columns as strings. - Why: practices selecting multiple rows using
.locand reinforces the layout of the data (countries × years).
- What it does:
-
Step 2: Plot graph using
kind='line':-
What the student is expected to try:
df_CI.plot(kind='line'). -
Why: explicitly specifying
kind='line'shows how to choose the plot type. However, pandas treats the DataFrame's index as x-values and the columns as series to plot. Sincedf_CIhas countries as the index and years as columns, plotting it directly results in country names on the x-axis (not what we want). -
Correct approach: transpose the dataframe so that years become the index and countries become columns. Then convert the year index to integers and plot:
df_CI_T = df_CI.transpose() df_CI_T.index = df_CI_T.index.map(int) df_CI_T.plot(kind='line')
-
Why this is important: clarifies the difference between Series and DataFrame plotting, demonstrates transposing to align the independent variable (years) on the x-axis, and shows explicit
kind='line'usage.
-
- Compare trend of top 5 countries that contributed the most to immigration
- What it does: sorts
df_canby a previously-createdTotalcolumn (cumulative immigrants), takes the top 5 rows, transposes the years into the index, converts the index to integers, and plots multiple lines on the same chart. - Why: practices sorting, selecting top rows, transposing for plotting, and plotting multiple series together for comparison. It also demonstrates controlling plot size with
figsizeand labeling axes.
- Error: ValueError: invalid literal for int() with base 10: 'India'
- Cause: calling
df.index = df.index.map(int)on a DataFrame whose index contains country names instead of year strings (this happens when you try to convertdf_CI.indexto int before transposing). - Fix: transpose the DataFrame first so the index contains year strings, then map to int. Example:
# wrong - index contains country names
# df_CI.index = df_CI.index.map(int)
# correct
df_CI_T = df_CI.transpose()
df_CI_T.index = df_CI_T.index.map(int)
df_CI_T.plot(kind='line')- When plotting a single country as a Series, you don't need to transpose; the Series index should be years.
- When plotting multiple countries from a DataFrame where countries are rows and years are columns, transpose so that the years are the DataFrame index and each country becomes a column.
- Use
kind='line'explicitly to make the intention clear;.plot()defaults to a line plot for Series but being explicit helps readability. - Convert year strings to integers for nicer x-axis tick formatting and to use data coordinates when annotating.
If you want, I can also:
- Add short captions to each exercise cell inside the notebook.
- Run the plotting cell to verify the corrected code and save a static PNG of the chart.
Completed: README.md created with explanations.