# Chapter 2: Data Manipulation with Pandas


Import packages and load data:

In [1]:
import pandas as pd 
import numpy as np

In [4]:
state_avg_df = pd.read_csv('data/model_state.csv')
state_by_year_df = pd.read_csv('data/climdiv_state_year.csv')

## Recipe 1: Data Transformation in Pandas - Summary


#### Filtering Data
We filtered out states with significant annual temperature change, specifically those with changes greater than 1.5°C. This allows us to focus on states experiencing more substantial temperature shifts.

In [5]:
significant_change_states = state_avg_df[state_avg_df['Annual'] > 1.5]
significant_change_states

Unnamed: 0,fips,Fall,Spring,Summer,Winter,max_warming_season,Annual,STUSAB,STATE_NAME,STATENS
5,9,1.453093,1.543407,1.580628,2.633975,Winter,1.801492,CT,Connecticut,1779780
6,10,1.378949,1.537848,1.522878,2.201002,Winter,1.661683,DE,Delaware,1779781
16,23,1.463238,1.308374,1.42757,2.882751,Winter,1.744621,ME,Maine,1779787
18,25,1.4423,1.350744,1.59461,2.43516,Winter,1.700653,MA,Massachusetts,606926
19,26,1.149714,1.692995,1.050526,2.463788,Winter,1.576899,MI,Michigan,1779789
20,27,1.148642,1.444557,0.907259,2.716586,Winter,1.555189,MN,Minnesota,662849
26,33,1.331429,1.333616,1.29679,2.579852,Winter,1.62243,NH,New Hampshire,1779794
27,34,1.655732,1.759266,1.738723,2.748938,Winter,1.977277,NJ,New Jersey,1779795
31,38,1.115584,1.331175,1.000409,3.145933,Winter,1.65443,ND,North Dakota,1779797
34,41,1.301573,1.240705,1.61322,1.819386,Winter,1.500649,OR,Oregon,1155107


Connecticut, Delaware, Maine, Massachusetts, Michigan, etc., showing varying degrees of annual temperature change above 1.5°C.

#### Grouping and Aggregating
We calculated the average temperature change for each season across all states. This provides a general understanding of how temperature changes vary by season.

In [6]:
average_seasonal_change = state_avg_df[['Fall', 'Spring', 'Summer', 'Winter']].mean()
average_seasonal_change

Fall      0.785324
Spring    1.004280
Summer    0.773815
Winter    1.668654
dtype: float64

#### Sorting
We arranged the states by their temperature change in the Summer season in descending order. This helps identify which states are experiencing the greatest changes during summer.

In [8]:
sorted_summer_change = state_avg_df.sort_values(by='Summer', ascending=False)[['STATE_NAME', 'Summer']]
sorted_summer_change.head()

Unnamed: 0,STATE_NAME,Summer
36,Rhode Island,2.114864
27,New Jersey,1.738723
41,Utah,1.702758
34,Oregon,1.61322
18,Massachusetts,1.59461


We notice about that Rhode Island, New Jersey, and Utah etc. had the largest positive changes in temperature across the summers