# Data
### Overview :
The rising average temperature of Earth's climate system, called global warming, is driving changes in rainfall patterns, extreme weather, arrival of seasons, and more. Collectively, global warming and its effects are known as climate change. While there have been prehistoric periods of global warming, observed changes since the mid-20th century have been unprecedented in rate and scale.
So a dataset on the temperature of major cities of the world will help analyze the same. Also weather information is helpful for a lot of data science tasks like sales forecasting, logistics etc.
The data is available for research and non-commercial purposes only.
### license :
http://academic.udayton.edu/kissock/http/Weather/default.htm
### Content :
Daily level average temperature values is present in city_temperature.csv file
### Acknowledgements :
University of Dayton for making this dataset available in the first place!

The data contributor : https://www.kaggle.com/sudalairajkumar


# Data Preparing

### 1. Importing the required libraries


In [None]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib as mpl
import matplotlib.pyplot as plt

!pip install plotly
!pip install chart_studio

import plotly.tools as tls
import plotly as py
import plotly.graph_objs as go
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
from chart_studio import plotly as py
from plotly.offline import iplot

%matplotlib inline


### 2. Loading the data into the data frame + Exploring The Data


In [None]:
df = pd.read_csv("../input/daily-temperature-of-major-cities/city_temperature.csv")
df.head()

In [None]:
len(df.Country.unique())

In [None]:
df.tail()

In [None]:
df.shape

In [None]:
df.info()

### 3. Dropping the duplicate rows

In [None]:
df = df.drop_duplicates()
df.shape

In [None]:
df.count()

### 4. Dealing with the missing or null values

In [None]:
for col in df.columns: # check missing values (Nan) in every column
    print("The " + col + " contains Nan" + ":" + str((df[col].isna().any())))

In [None]:
for col in df.columns: # check missing values (Zeros) in every column
    print("The " + col + " contains 0" + ":" + str((df[col] == 0 ).any()))
df = df[df.Day != 0]
df.head()

In [None]:
df = df[(df.Year!=200) & (df.Year!=201)]
df.head()

we don't have missing values. Our data is **ready**

# Exploratory Data Analysis : EDA

### 1. Average Temperture in every region

In [None]:
Average_Temperture_in_every_region = df.groupby("Region")["AvgTemperature"].mean().sort_values()[-1::-1]
Average_Temperture_in_every_region = Average_Temperture_in_every_region.rename({"South/Central America & Carribean":"South America","Australia/South Pacific":"Australia"})
Average_Temperture_in_every_region

In [None]:
plt.figure(figsize = (15,8))
plt.bar(Average_Temperture_in_every_region.index,Average_Temperture_in_every_region.values)
plt.xticks(rotation = 10,size = 15)
plt.yticks(size = 15)
plt.ylabel("Average_Temperture",size = 15)
plt.title("Average Temperture in every region",size = 20)
plt.show()

### 2. Growth of the average Temperture in every region over time

In [None]:
# change the index to date
datetime_series = pd.to_datetime(df[['Year','Month', 'Day']])
df['date'] = datetime_series
df = df.set_index('date')
df = df.drop(["Month","Day","Year"],axis = 1)
df.head()

In [None]:
region_year = ['Region', pd.Grouper(freq='Y')]
df_region = df.groupby(region_year).mean()
df_region.head()

In [None]:
plt.figure(figsize = (15,8))
for region in df["Region"].unique():

    plt.plot((df_region.loc[region]).index,df_region.loc[region]["AvgTemperature"],label = region) 
    
plt.legend()
plt.title("Growth of the average Temperture in every region over time",size = 20)
plt.xticks(size = 15)
plt.yticks(size = 15)
plt.show()

### 3. Growth of the average Temperture (Earth)

In [None]:
df_earth = df.groupby([pd.Grouper(freq = "Y")]).mean()
df_earth.head()

In [None]:
plt.figure(figsize = (15,8))
plt.plot(df_earth.index,df_earth.values,marker ="o")
plt.xticks(size =15)
plt.ylabel("average Temperture",size = 15)
plt.yticks(size =15)
plt.title("Growth of the average Temperture (Earth)",size =20)
plt.show()

### 3. The hotest Cities in The world

In [None]:
top_10_hotest_Cities_in_The_world = df.groupby("City").mean().sort_values(by = "AvgTemperature")[-1:-11:-1]
top_10_hotest_Cities_in_The_world

In [None]:
plt.figure(figsize = (15,8))
plt.barh(top_10_hotest_Cities_in_The_world.index,top_10_hotest_Cities_in_The_world.AvgTemperature)

### 4. The Growth of the Temperture in the hotest Cities in The world

In [None]:
city_year = ['City', pd.Grouper(freq='Y')]
df_city = df.groupby(city_year).mean()
df_city.head()

In [None]:
plt.figure(figsize = (20,8))
for city in top_10_hotest_Cities_in_The_world.index:
    plt.plot(df_city.loc[city].index,df_city.loc[city].AvgTemperature,label = city)
plt.legend()
plt.yticks(size = 15)
plt.xticks(size = 15)
plt.ylabel("Average Temperature",size = 15)
plt.title("The Growth of the Temperture in the hotest Cities in The world",size = 20)
plt.show()

### 5. The hotest Countries in The world

In [None]:
hotest_Countries_in_The_world = df.groupby("Country").mean().sort_values(by = "AvgTemperature")
hotest_Countries_in_The_world.tail()

In [None]:
plt.figure(figsize = (20,8))
plt.bar(hotest_Countries_in_The_world.index[-1:-33:-1],hotest_Countries_in_The_world.AvgTemperature[-1:-33:-1])
plt.yticks(size = 15)
plt.ylabel("Avgerage Temperature",size = 15)
plt.xticks(rotation = 90,size = 12)
plt.title("The hotest Countries in The world",size = 20)
plt.show()

### 7. The Average Temperature around the world

when using plotly we need codes of countries 

#### Data of Codes:

https://www.kaggle.com/juanumusic/countries-iso-codes/data

In [None]:
code = pd.read_csv("../input/countries-iso-codes/wikipedia-iso-country-codes.csv") # this is for the county codes
code= code.set_index("English short name lower case")
code.head()

I changed some countries name in the code data frame so they become the same as our main data frame index

##### This is important when merging the two data frames

In [None]:
code = code.rename(index = {"United States Of America":"US","Côte d'Ivoire":"Ivory Coast","Korea, Republic of (South Korea)":"South Korea","Netherlands":"The Netherlands","Syrian Arab Republic":"Syria","Myanmar":"Myanmar (Burma)","Korea, Democratic People's Republic of":"North Korea","Macedonia, the former Yugoslav Republic of":"Macedonia","Ecuador":"Equador","Tanzania, United Republic of":"Tanzania","Serbia":"Serbia-Montenegro"})
code.head()

##### Now we do the merging between the code data frame and our data

In [None]:
hott = pd.merge(hotest_Countries_in_The_world,code,left_index = True , right_index = True , how = "left")
hott.head()

In [None]:
data = [dict(type = "choropleth",autocolorscale = False, locations=  hott["Alpha-3 code"], z = hott["AvgTemperature"] ,
              text = hott.index,colorscale = "reds",colorbar = dict(title = "Temperture"))]                         

In [None]:
layout = dict(title = "The Average Temperature around the world",geo = dict(scope = "world",projection = dict(type = "equirectangular"),showlakes = True,lakecolor = "rgb(66,165,245)",),)

In [None]:
fig = dict(data = data,layout=layout)
iplot(fig,filename = "d3-choropleth-map")

### 8. Variation of the mean Temperature Over The 12 months around the world

In [None]:
Variation_world = df.groupby(df.index.month).mean()
Variation_world = Variation_world.rename(index = {1:"January",2:"February" ,3:"March" ,4:"April" ,5:"May" ,6:"June" ,7:"July" ,8:"August" ,9:"September" ,10:"October" ,11:"November" ,12:"December" })

In [None]:
plt.figure(figsize=(18,8))
sns.barplot(x=Variation_world.index, y= 'AvgTemperature',data=Variation_world,palette='Set2')
plt.title('AVERAGE MEAN TEMPERATURE OF THE WORLD',size = 25)
plt.xticks(size = 15)
plt.yticks(size = 20)
plt.xlabel("Month",size = 20)
plt.ylabel("AVERAGE MEAN TEMPERATURE",size = 15)
plt.show()

### 9. Variation of the mean Temperature Over The 12 months in the hottest country in the world: United Arab Emirates	

In [None]:
Variation_UAE = df.loc[df["Country"] == "United Arab Emirates"].groupby(df.loc[df["Country"] == "United Arab Emirates"].index.month).mean()
Variation_UAE = Variation_UAE.rename(index = {1:"January",2:"February" ,3:"March" ,4:"April" ,5:"May" ,6:"June" ,7:"July" ,8:"August" ,9:"September" ,10:"October" ,11:"November" ,12:"December" })

In [None]:
plt.figure(figsize=(18,8))
sns.barplot(x=Variation_UAE.index, y= 'AvgTemperature',data=Variation_UAE,palette='Set2')
plt.title('Variation of the mean Temperature Over The 12 months in the United Arab Emirates',size = 20)
plt.xticks(size = 15)
plt.yticks(size = 20)
plt.xlabel("Month",size = 20)
plt.ylabel("AVERAGE MEAN TEMPERATURE",size = 15)
plt.show()

### 10. Variation of mean Temperature over the months for each region

In [None]:
plt.figure(figsize=(30,55))
i= 1 # this is for the subplot
for region in df.Region.unique(): # this for loop make it easy to visualize every region with less code
    
    region_data =df[df['Region']==region]
    final_data= region_data.groupby(region_data.index.month).mean()['AvgTemperature'].sort_values(ascending=False)

    final_data = pd.DataFrame(final_data)
    final_data = final_data.sort_index()

    final_data = final_data.rename(index = {1:"January",2:"February" ,3:"March" ,4:"April" ,5:"May" ,6:"June" ,7:"July" ,8:"August" ,9:"September" ,10:"October" ,11:"November" ,12:"December" })
    plt.subplot(4,2,i)
    sns.barplot(x=final_data.index,y='AvgTemperature',data=final_data,palette='Paired')
    plt.title(region,size = 20)
    plt.xlabel(None)
    plt.xticks(rotation = 90,size = 18)
    plt.ylabel("Mean Temperature",size = 15)
    i+=1


### 11. The Average Temperature in the USA states

In [None]:
Average_Temperature_USA = df.loc[df["Country"] == "US"].groupby("State").mean().drop(["Additional Territories"],axis = 0)
Average_Temperature_USA.head()


#### we need to add the code to this data for visualization

In [None]:
usa_codes = pd.read_csv('../input/usa-states-codes/csvData.csv')
usa_codes =usa_codes.set_index("State")
Average_Temperature_USA = pd.merge(Average_Temperature_USA,usa_codes,how = "left",right_index = True,left_index = True)
Average_Temperature_USA.head()

In [None]:
data_usa = [dict(type = "choropleth",autocolorscale = False, locations=  Average_Temperature_USA["Code"], z = Average_Temperature_USA["AvgTemperature"] ,
              locationmode="USA-states",
              text = Average_Temperature_USA.index,colorscale = "reds",colorbar = dict(title = "Temperture"))]                         
layout_usa = dict(title = "The Average Temperature in the USA states",geo = dict(scope = "usa",projection = dict(type = "albers usa"),showlakes = True,lakecolor = "rgb(66,165,245)",),)

In [None]:
fig_usa = dict(data = data_usa,layout=layout_usa)
iplot(fig_usa,filename = "d3-choropleth-map")

### 12.Average Temperature in USA from 1995 to 2020

In [None]:
Temperature_USA_year = df.loc[df["Country"] == "US"].groupby(pd.Grouper(freq = "Y")).mean()
Temperature_USA_year.head()

In [None]:
plt.figure(figsize = (15,8))
sns.barplot(x = Temperature_USA_year.index.year,y = "AvgTemperature",data = Temperature_USA_year)
plt.yticks(size = 15)
plt.xticks(size = 15,rotation = 90)
plt.xlabel(None)
plt.ylabel("Avgerage Temperature",size = 15)
plt.title("Average Temperature in USA from 1995 to 2020",size = 20)
plt.show()