Skip to content

Basseychrist/Exercise-Introduction-to-Matplotlib-and-Line-Plots

Repository files navigation

Introduction to Matplotlib and Line Plots — Exercise README

This README explains what each exercise question in DV0101EN-Exercise-Introduction-to-Matplotlib-and-Line-Plots--v2.ipynb does and why it is asked. It also includes a quick note about a common error and how to fix it.

How to run

  • Use a Python environment with Jupyter Notebook or Jupyter Lab.
  • Minimal required packages: pandas, numpy, matplotlib.
  • Example (PowerShell):
pip install pandas numpy matplotlib
jupyter notebook

Open DV0101EN-Exercise-Introduction-to-Matplotlib-and-Line-Plots--v2.ipynb and run the cells in order.


Notebook overview

The notebook teaches basic data wrangling with pandas and plotting with matplotlib (via pandas' plotting API). It uses a cleaned Canada immigration CSV and walks through:

  • Loading the dataset and setting Country as the index
  • Creating series/dataframes for specific countries or sets of countries
  • Plotting line charts to show trends over time

Below are the exercise questions and an explanation of each.

Questions and explanations

  1. Plot a line graph of immigration from Haiti using df.plot()
  • What it does: selects the Haiti row from df_can as a pandas Series (years are the index) and calls .plot().
  • Why this is asked: demonstrates plotting a single time series. It shows that when you plot a Series, pandas uses the Series' index (years) on the x-axis automatically.
  • Key point: because the result is a Series, you do not need to transpose; you can convert the index to integers for nicer x-axis labels (index.map(int)).
  1. Annotate the 2010 earthquake on the Haiti plot
  • What it does: uses plt.text() to add a label at a chosen (x, y) data coordinate.
  • Why: teaches basic annotation and the importance of index data types — if years are integers you can annotate by year value; if they are strings you'd annotate by positional index.
  1. Compare the number of immigrants from India and China (two-part exercise)
  • Step 1: Get the data set for China and India and display the dataframe:

    • What it does: df_CI = df_can.loc[['India', 'China'], years] selects the two countries (rows) and the columns for the years (1980–2013). This produces a DataFrame with countries as the index and year columns as strings.
    • Why: practices selecting multiple rows using .loc and reinforces the layout of the data (countries × years).
  • Step 2: Plot graph using kind='line':

    • What the student is expected to try: df_CI.plot(kind='line').

    • Why: explicitly specifying kind='line' shows how to choose the plot type. However, pandas treats the DataFrame's index as x-values and the columns as series to plot. Since df_CI has countries as the index and years as columns, plotting it directly results in country names on the x-axis (not what we want).

    • Correct approach: transpose the dataframe so that years become the index and countries become columns. Then convert the year index to integers and plot:

      df_CI_T = df_CI.transpose() df_CI_T.index = df_CI_T.index.map(int) df_CI_T.plot(kind='line')

    • Why this is important: clarifies the difference between Series and DataFrame plotting, demonstrates transposing to align the independent variable (years) on the x-axis, and shows explicit kind='line' usage.

  1. Compare trend of top 5 countries that contributed the most to immigration
  • What it does: sorts df_can by a previously-created Total column (cumulative immigrants), takes the top 5 rows, transposes the years into the index, converts the index to integers, and plots multiple lines on the same chart.
  • Why: practices sorting, selecting top rows, transposing for plotting, and plotting multiple series together for comparison. It also demonstrates controlling plot size with figsize and labeling axes.

Common pitfall (ValueError when mapping index to int)

  • Error: ValueError: invalid literal for int() with base 10: 'India'
  • Cause: calling df.index = df.index.map(int) on a DataFrame whose index contains country names instead of year strings (this happens when you try to convert df_CI.index to int before transposing).
  • Fix: transpose the DataFrame first so the index contains year strings, then map to int. Example:
# wrong - index contains country names
# df_CI.index = df_CI.index.map(int)

# correct
df_CI_T = df_CI.transpose()
df_CI_T.index = df_CI_T.index.map(int)
df_CI_T.plot(kind='line')

Tips and notes

  • When plotting a single country as a Series, you don't need to transpose; the Series index should be years.
  • When plotting multiple countries from a DataFrame where countries are rows and years are columns, transpose so that the years are the DataFrame index and each country becomes a column.
  • Use kind='line' explicitly to make the intention clear; .plot() defaults to a line plot for Series but being explicit helps readability.
  • Convert year strings to integers for nicer x-axis tick formatting and to use data coordinates when annotating.

If you want, I can also:

  • Add short captions to each exercise cell inside the notebook.
  • Run the plotting cell to verify the corrected code and save a static PNG of the chart.

Completed: README.md created with explanations.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published