## Example from class

Here is the example that I did in class with some notes to go along with it.

The example demonstrates the steps in answering a question we have about tabular data using pandas.

Our question on our toy data set is: "For the locations that got more than 40 cm of snow on January 25, how much snow did they receive on January 15?"

In [1]:
# Here is the data we are starting with expressed as a dictionary
snow_data = {
    "Location": ["Downtown Toronto", "Toronto Pearson", "Ottawa"],
    "Snow Jan 15": [21, 22, 18],
    "Snow Jan 25": [56, 42, 2]
}

snow_data

{'Location': ['Downtown Toronto', 'Toronto Pearson', 'Ottawa'],
 'Snow Jan 15': [21, 22, 18],
 'Snow Jan 25': [56, 42, 2]}

In [3]:
import pandas as pd
# convert the dictionary to a dataframe
snow_df = pd.DataFrame(snow_data)

snow_df

Unnamed: 0,Location,Snow Jan 15,Snow Jan 25
0,Downtown Toronto,21,56
1,Toronto Pearson,22,42
2,Ottawa,18,2


In [4]:
# Create a Series from the "Snow Jan 25" column

# select the column "Snow Jan 25" and store it in latest_snow
latest_snow = snow_df["Snow Jan 25"]

# create a boolean series for the elements where the value in latest_snow is
# greater than 40

too_much_snow = latest_snow > 40

# Use the boolean series to select the rows where the series is true

tms_df = snow_df[too_much_snow]
tms_df

Unnamed: 0,Location,Snow Jan 15,Snow Jan 25
0,Downtown Toronto,21,56
1,Toronto Pearson,22,42


In [None]:
# select only the columns that show the snow fall for Jan 15

important_columns = ["Location", "Snow Jan 15"]
tms_df[important_columns]


Unnamed: 0,Location,Snow Jan 15
0,Downtown Toronto,21
1,Toronto Pearson,22


In [None]:
# It is an *excellent* idea to form these operations step by step like we have done above. However, we can do it all on one line:

snow_df[snow_df["Snow Jan 25"] > 40][["Location", "Snow Jan 15"]]


Unnamed: 0,Location,Snow Jan 15
0,Downtown Toronto,21
1,Toronto Pearson,22


In [12]:
# Notice that snow_df hasn't changed

snow_df

Unnamed: 0,Location,Snow Jan 15,Snow Jan 25
0,Downtown Toronto,21,56
1,Toronto Pearson,22,42
2,Ottawa,18,2


In [None]:
# Another thought that we will talk about next week or the week after
# How to get the total accumulation of snow?

# Here is one way (I had to look up the arguments to sum)
#total_snow = snow_df.sum(axis=1, numeric_only=True)

# Here is another way - We can add two series together.
total_snow = snow_df["Snow Jan 15"] + snow_data["Snow Jan 25"]

total_snow


0    77
1    64
2    20
Name: Snow Jan 15, dtype: int64

In [None]:
# Now let's add this as an additional column to the dataframe  
# (We haven't covered this in class yet.  It just fit nicely in this example.)

snow_df["Total"] = snow_df["Snow Jan 15"] + snow_data["Snow Jan 25"]

snow_df


Unnamed: 0,Location,Snow Jan 15,Snow Jan 25,Total
0,Downtown Toronto,21,56,77
1,Toronto Pearson,22,42,64
2,Ottawa,18,2,20
