# Chapter 1: Introduction to Python for Data Science

#### Getting Started

Before we can begin exploring our data, we'll want to load in two great Python packages for data science.
Pandas has lots of features built in to allow us to read, sort, summarize and group data.
Numpy allows us to work with arrays and has features that allow us to use more advance mathematical techniques.

In [1]:
# importing a package as another name lets us call the package without having to
# type out the whole name after it's run
import pandas as pd 
import numpy as np

## Recipe 1: Data Exploration with Python


#### Loading and Viewing the Dataset
We begin by loading the dataset and looking at the first few rows. This step gives us a glimpse into the structure of the dataset, helping us understand the type of data we're dealing with.

In [3]:
# Load the data
file_path = 'data/model_state.csv'

# pandas has a .read_csv method that returns a DF object with the data from the csv
data = pd.read_csv(file_path)

We can checkout the first few rows of the data using the .head() function:

In [5]:
first_few_rows = data.head()

# just typing the variable at the end of the cell will display it

first_few_rows

Unnamed: 0,fips,Fall,Spring,Summer,Winter,max_warming_season,Annual,STUSAB,STATE_NAME,STATENS
0,1,-0.195668,-0.105862,-0.325009,0.458526,Winter,-0.035048,AL,Alabama,1779775
1,4,1.203951,1.38448,1.274455,1.388388,Winter,1.31988,AZ,Arizona,1779777
2,5,-0.04254,0.266399,0.058596,0.532247,Winter,0.214074,AR,Arkansas,68085
3,6,1.570921,1.449242,1.478335,1.41243,Fall,1.480561,CA,California,1779778
4,8,1.055309,1.43691,1.367845,1.838758,Winter,1.438589,CO,Colorado,1779779


#### Checking Data Types
Next, we can check the data types of each column. Understanding the data types is crucial for data manipulation and choosing the right analysis techniques.

In [6]:
data_types = data.dtypes
data_types

fips                    int64
Fall                  float64
Spring                float64
Summer                float64
Winter                float64
max_warming_season     object
Annual                float64
STUSAB                 object
STATE_NAME             object
STATENS                 int64
dtype: object

We can see that we some integer data columns, some floating point numbers (has a decimal place) and 3 object type columns. Object type just means we have multiple data types in that column.

#### Calculating Basic Statistics
Lastly, we can calculate basic statistics for the numerical columns. These statistics include count, mean, standard deviation, minimum and maximum values, and quartiles.

In [7]:
basic_statistics = data.describe()
basic_statistics

Unnamed: 0,fips,Fall,Spring,Summer,Winter,Annual,STATENS
count,48.0,48.0,48.0,48.0,48.0,48.0,48.0
mean,30.1875,0.785324,1.00428,0.773815,1.668654,1.060972,1480213.0
std,15.448826,0.523901,0.480059,0.630515,0.713727,0.545513,489543.7
min,1.0,-0.195668,-0.105862,-0.325009,0.339203,-0.035048,68085.0
25%,18.75,0.361859,0.72437,0.212762,1.191115,0.641008,1203653.0
50%,30.5,0.769284,1.074631,0.866977,1.507633,1.102286,1779782.0
75%,42.5,1.163273,1.358113,1.280039,2.27448,1.518205,1779795.0
max,56.0,1.655732,1.759266,2.114864,3.145933,2.038868,1785534.0


We can interpret this by looking across all of the metrics and notice:
    
    - We have 48 values for each column (count)
    - The average annual temperature change was +1.060972 (mean)
    - The maximum temperature chage in the Winter was 3.145933 (max)
    - The median temperature change in the Fall was 0.769284 (50%)

## Recipe 2: NumPy Operations on Data

In this recipe, we focus on performing statistical operations using NumPy on a specific season's temperature data. We chose the 'Summer' season for this example.



#### Extracting Summer Temperature Data
First, we extracted the 'Summer' temperature data from our dataset. This gave us an array of temperature changes for each state during the summer season.

In [8]:
# using brackets ['Column_Name'] after a pandas DF object lets you access the "Column_Name" column
# an individual column is a series and the values can be accessed using .values
summer_temps = data['Summer'].values
summer_temps

array([-0.32500882,  1.27445503,  0.05859612,  1.4783351 ,  1.3678448 ,
        1.58062787,  1.52287831,  0.91445503, -0.01605644,  1.07271958,
        0.21748148,  0.01807407,  0.06171429,  0.49185185, -0.19575309,
        0.19860317,  1.42756966,  1.18030335,  1.59461023,  1.05052557,
        0.90725926, -0.20665961,  0.01409524,  1.06418342,  0.5089806 ,
        1.35301587,  1.29679012,  1.7387231 ,  1.10857143,  0.70418342,
        0.39226808,  1.00040917,  0.2531358 ,  0.07994356,  1.61322046,
        0.53399647,  2.1148642 ,  0.29062434,  0.82669489, -0.01742504,
        0.35722046,  1.70275838,  1.04533333,  0.49347443,  1.15332628,
        0.10457848,  0.47966138,  1.25605644])

#### Calculating Statistical Measures
Using NumPy, we can calculate the following statistical measures for the Summer temperature data:

1. Mean Temperature: The average temperature change during the summer across all states.

In [10]:
mean_summer_temp = np.mean(summer_temps)
print("Mean summer temperature is:", mean_summer_temp)

Mean summer temperature is: 0.7738148148148148


2. Median Temperature: The median temperature change, providing a middle value that separates the higher half from the lower half of the temperature data.

In [11]:
median_summer_temp = np.median(summer_temps)
print("Median summer temperature is:", median_summer_temp)


Median summer temperature is: 0.8669770723103998


3. Standard Deviation: This measures the amount of variation or dispersion in the summer temperature data.

In [12]:
std_dev_summer_temp = np.std(summer_temps)
print("Mean summer temperature is:", std_dev_summer_temp)


Mean summer temperature is: 0.6239120874061266
