# Exploring Weather Trends

#### Udacity Data Analyst Nanodegree - Project 1

## Import Statements

*pandas* - Used for reading managing the dataset provided

*plotly* - Allows us to create amazing graphs and interactive visualization

In [1]:
import pandas
import plotly

## Reading dataset

In [2]:
city_data = pandas.read_csv('./data/city_data.csv')
global_data = pandas.read_csv('./data/global_data.csv')

We will be focusing on just two cities, **Bangalore** and **New York**, and the global average temperatures. More cities can be added if needed

Finding the row indices where the temperature data for the cities needed is located in the dataset

In [3]:
bangalore_rows = city_data.index[city_data['city'] == 'Bangalore']
new_york_rows = city_data.index[city_data['city'] == 'New York']

Extracting the data from the dataset and saving it as Pandas DataFrame

In [4]:
new_york_data = pandas.DataFrame(city_data.iloc[new_york_rows])
bangalore_data = pandas.DataFrame(city_data.iloc[bangalore_rows])

In [5]:
new_york_data.head()

Unnamed: 0,year,city,country,avg_temp
46341,1743,New York,United States,3.26
46342,1744,New York,United States,11.66
46343,1745,New York,United States,1.13
46344,1746,New York,United States,
46345,1747,New York,United States,


In [6]:
bangalore_data.head()

Unnamed: 0,year,city,country,avg_temp
6367,1796,Bangalore,India,24.49
6368,1797,Bangalore,India,25.18
6369,1798,Bangalore,India,24.65
6370,1799,Bangalore,India,24.81
6371,1800,Bangalore,India,24.85


In [7]:
global_data.head()

Unnamed: 0,year,avg_temp
0,1750,8.72
1,1751,7.98
2,1752,5.78
3,1753,8.39
4,1754,8.47


## Processing data

We do not need to city and country columns. We are only concerned about the average temperature for each year. So the unwanted columns can be dropped. 

In [8]:
new_york_data = new_york_data.drop(columns=['city', 'country'])
bangalore_data = bangalore_data.drop(columns=['city', 'country'])

We can observe that there are some NaN values in the average temperature column

In [9]:
new_york_data.isna().sum()

year        0
avg_temp    5
dtype: int64

In [10]:
bangalore_data.isna().sum()

year        0
avg_temp    7
dtype: int64

In [11]:
global_data.isna().sum()

year        0
avg_temp    0
dtype: int64

The number of NaN values is really small. We can use the Pandas methods to fill in those missing values

In [12]:
new_york_data = new_york_data.fillna(method='ffill')
bangalore_data = bangalore_data.fillna(method='ffill')

In [13]:
new_york_data.isna().sum()

year        0
avg_temp    0
dtype: int64

In [14]:
bangalore_data.isna().sum()

year        0
avg_temp    0
dtype: int64

We can now merge all the DataFrames to form a single DataFrame. We can observe that **year** column is common and that can be used as the reference for merging. The **year** column can be used as the index for this merged data.

In [15]:
merged = global_data.merge(new_york_data, on='year').merge(bangalore_data, on='year')
merged.columns = ['year', 'Global', 'NYC', 'Bangalore']
merged = merged.set_index('year')

## Plotting

In [16]:
plotly.offline.init_notebook_mode(connected=True)
plotly.offline.iplot({'data': [{'x': merged.index, 
                                'y': merged[col], 
                                'name': col
                               } for col in merged.columns],
                      'layout': plotly.graph_objs.Layout(title='Average temperature trend',
                                                         xaxis=dict(title='Time in Year'),
                                                         yaxis=dict(title='Average Temperature in Celcius'))
                     }, filename='average-temperature-trend')

We can observe plot is not smooth and harder to read.

 A moving average will make the plot easier to visualize. We can using a moving window of size 3 for each data point for satisfying result

In [17]:
merged_mean = merged.rolling(window=3, center=True, min_periods=1).mean()
plotly.offline.init_notebook_mode(connected=True)
plotly.offline.iplot({'data': [{'x': merged_mean.index, 
                                'y': merged_mean[col], 
                                'name': col
                               } for col in merged_mean.columns],
                      'layout': plotly.graph_objs.Layout(title='Average temperature trend',
                                                         xaxis=dict(title='Time in Year'),
                                                         yaxis=dict(title='Average Temperature in Celcius'))
                     }, filename='average-temperature-trend')

We can see that Bangalore's temeperature is always an approximately 2.5 higher than New York's temperatures. Although it is higher generally, we can see the ups and downs in the Bangalore's temperature coincides with most of the ups and downs in the global average temperature. New York's temperature is more erratic and has many peaks and troughs. But that too follows the global average tempeature's trend like Bangalore.

In [28]:
merged_large_mean = merged.rolling(window=len(merged), center=True, min_periods=1).mean()
plotly.offline.init_notebook_mode(connected=True)
plotly.offline.iplot({'data': [{'x': merged_large_mean.index, 
                                'y': merged_large_mean[col], 
                                'name': col
                               } for col in merged_large_mean.columns],
                      'layout': plotly.graph_objs.Layout(title='Average temperature trend',
                                                         xaxis=dict(title='Time in Year'),
                                                         yaxis=dict(title='Average Temperature in Celcius'))
                     }, filename='average-temperature-trend')

We can see that the temperature has been slowing rising for the past few centuries

## Summary