# Visualisations with Matplotlib & Seaborn

Today, I'm going to walk you through making some visualisations in Matplotlib and Seaborn. You have already seen my basic matplotlib visualisations before and I am going to go step by step through how to make different types of graphs in Matplotlib. Matplotlib, although very useful, is rather old (~2003!) and the graphs can seem very outdated. 

Therefore afterwards, we're going to talk about another library called Seaborn, and we're going to see what the differences are within the libraries, visually and practically. 

There will be an exercise to work on real life data, since you will be working on a project next class.

If there is time, we will go through the other notebook on how to make maps, and if not, you can always go through that yourself! 

Let's get started! 

## Visualisations with Matplotlib

### Line Plots

Line graphs is perhaps the most basic form of visualisation, where corresponding coordinates (x,y) are mapped as a marker, and markers are connected with straight lines. 


In [None]:
import matplotlib.pyplot as plt
%matplotlib inline
import numpy as np
import pandas as pd  #Plot a line!

df = pd.read_csv('/home/dhruvy/Desktop/GlobalTemperatures.csv')
temp = df[['dt','LandAverageTemperature','LandAndOceanAverageTemperature']] #dropping redundant columns 
temp['dt'] = pd.to_datetime(temp['dt'])
temp.head(10)

In [None]:
plt.figure(figsize=(20, 10), dpi=100)
plt.plot(temp['dt'],temp['LandAverageTemperature'], color='#800000')
plt.show()

In [None]:
plt.figure(figsize=(20,10), dpi=50)
plt.plot(temp['dt'],temp['LandAverageTemperature'], color='#800000')
plt.plot(temp['dt'],temp['LandAndOceanAverageTemperature'], color='#FFFF00')
plt.show()

Not very useful is it? It doesn't tell us much, and actually, it seems that there are no changes. Perhaps it would be more useful to plot it in a different way. What do you think?


In [None]:
#Let's work with Pandas to seperate our data 

temp['year'] = temp['dt'].map(lambda x : x.year)
temp['month'] = temp['dt'].map(lambda x : x.month)


In [None]:
# What is a lambda function? Have a look at the example below. 

def h(x): 
    return x**2

print h(8)

f = lambda x: x **2

print f(8)

# It performs exactly the same operation as the function we defined as h above, but has much simpler syntax

In [None]:
"""The map function on the other hand returns a list. Take an argument, and iterate through it and apply
something to every iterable, and return the result as a list. 

Let's say you give me a list that you want squared. You already know the way in this cell """

oneten = [1,2,3,4,5,6,7,8,9,10]

def squ(x):
    sq=[]
    for i in x:
        sq.append(i**2)
    return sq

print squ(oneten)

In [None]:
# The map function does exactly the same thing, just more concisely. The basic syntax is map(function, item)

map(lambda x : x**2,oneten)

'def squared(x): return x**2' #You can also pass functions into map

In [None]:
def label_seasons(month): # Making a function we can pass into the new month frame, to label seasons
    if month  >= 3 and month <=5:
        return 'spring'
    elif month>=5 and month <=8:
        return 'summer'
    elif month >=9 and month <=11:
        return 'autumn'      
    else:
        return 'winter'

first_year = temp['year'].min()
last_year = temp['year'].max()
year_range = range(first_year, last_year + 1) # getting our range to iterate through


# Using a similar operation from before to create a new column seasons by applying the label_seasons function 
temp['season'] = temp['month'].apply(label_seasons) 


summer_temps = []
autumn_temps = []
winter_temps = []
spring_temps = []

for year in year_range:
    cur_year_temps = temp[temp['year'] == year]
    summer_temps.append(cur_year_temps[cur_year_temps['season'] == 'summer']['LandAverageTemperature'].mean())
    autumn_temps.append(cur_year_temps[cur_year_temps['season'] == 'autumn']['LandAverageTemperature'].mean())
    winter_temps.append(cur_year_temps[cur_year_temps['season'] == 'winter']['LandAverageTemperature'].mean())
    spring_temps.append(cur_year_temps[cur_year_temps['season'] == 'spring']['LandAverageTemperature'].mean())
    
plt.figure(figsize=(15, 10), dpi=80)
plt.plot(year_range, summer_temps, label='Summers average temperature', color='orange')
plt.plot(year_range, autumn_temps, label='Autumns average temperature', color='r')
plt.plot(year_range, winter_temps, label='Winters average temperature', color='b')
plt.plot(year_range, spring_temps, label='Springs average temperature', color='g')
plt.xlim(first_year,last_year)
plt.ylim(0,18)
plt.xlabel('Year', fontsize=20)
plt.ylabel('Average Temperature',fontsize=20)
plt.title('Average Temperature by Season from 1750 to 2015',fontsize=32)
plt.legend(loc='center left', bbox_to_anchor=(1, 0.5), frameon=True, borderpad=1, borderaxespad=1)
plt.savefig('/home/dhruvy/Desktop/s.pdf') #other extentions that work are .png or .jpg
plt.show()

#P.S. - If you want to change how the 'ticks' work, you can use this:
'''axes.set_xticks([Put a list of ticks you want to see here])
    axes.set_yticks([Same here])'''

''' Now you know how to properly use pandas to perform visualisations in matplotlib! 
    Let's have a look at how to make other types of charts quickly!'''    

### Pie Charts

Although I don't have a direct example here, let's have a look at a theoretical example. 

Pie charts are usually used to represent how the pie is divvied up, so to speak. The idea is there is a whole that is shared by many (read: more than one) different things, and a pie chart is a simple representation of that. 



In [None]:
# Setting values here for top 10 populations in the world
values = [1388232693,1342512706,326474013,263510146,211243220,196744376,191835936,164827718,143375006,130222815]
colors = ['r', 'g', 'b', 'c', 'm'] # Setting a list of colors 
labels = ['China','India','USA','Indonesia','Brazil','Pakistan','Nigeria','Bangladesh','Russia','Mexico']
plt.pie(values, colors= colors, labels=labels)
plt.title('Top 10 Populations (~60% of total world population)')
plt.show()

''' You can also get the piechart to explode. Try make the following list above:
    explode = [0.3,0.25,0.2,0.15,0.1,0.05,0,0,0,0]
    Then in plt.pie(), inclue the argument explode=explode and see that happens.'''

### Bar Charts

Let's use another dataset to illustrate how to make a bar chart. Import the gtd.csv file first


In [None]:
gtd = pd.read_csv('/home/dhruvy/Desktop/gtd.csv')
gtd.head()


In [None]:
from collections import Counter
year_list = Counter(gtd['iyear'])
top_20 = {}

for x,y in year_list.most_common(20):
    top_20[x] = y

In [None]:
plt.bar(range(len(top_20)), top_20.values(),align='center')
plt.xticks(range(len(top_20)), top_20.keys(),rotation='vertical')
plt.suptitle('20 Years with the Most Terrorist Attacks since 1970')
plt.xlabel('Year')
plt.ylabel('# of Terrorist Attacks')
plt.show()

### Scatter Plots

Let's use the same data to make a scatter plot to see if there are any correlations.


In [None]:
x = gtd['nkill']
y = gtd['nwound']

plt.title('Scatter plot of fatalities vs wounded')
plt.xlabel('Killed')
plt.ylabel('Wounded')
plt.scatter(x,y)
plt.show()

### Histograms

These are quite simple to make, as long as you remember the concept. Remember all those PDF's we discussed last time? You generally use histogram to find out if there is a PDF that perhaps matches your data distribution. They're really quite simple to make in Python, and often are used for cursory analysis of your data.



In [None]:
incomes = np.random.normal(77000, 15000, 10000)
plt.hist(incomes, 50)
plt.xlabel('Incomes')
plt.ylabel('Count')
plt.show()

We're done with matplotlib for now! This is not the entirety of matplotlib of course, but only the major types of visualisations that you will probably use. Now you get how the basics of matplotlib work, you can have  a look here to see what else you can make with Matplotlib.

[Matplotlib Examples & Documentation](http://matplotlib.org/1.5.1/gallery.html)

## Seaborn

Seaborn is another very useful (and advanced) library that we can use to make beautiful visualisations. Although more complicated, I would like to take you through the basics of Seaborn, and how it is different in usage than matplotlib. 

In [None]:
#First let's import seaborn. The convention is to import it as sns
import seaborn as sns

In [None]:
# Let's go back to our global warming dataset and plot our first line plot using Seaborn
sns.set(style="darkgrid")
sns.set_color_codes("pastel")
plt.figure(figsize=(15, 10), dpi=80)
plt.plot(year_range, summer_temps, label='Summers average temperature', color='orange')
plt.plot(year_range, autumn_temps, label='Autumns average temperature', color='r')
plt.plot(year_range, winter_temps, label='Winters average temperature', color='b')
plt.plot(year_range, spring_temps, label='Springs average temperature', color='g')
plt.xlim(first_year,last_year)
legend = plt.legend(loc='center left', bbox_to_anchor=(1, 0.5), frameon=True, borderpad=1, borderaxespad=1)

In [None]:
# Let's make a cooler bar graph using another global warming graph
by_country = pd.read_csv('/home/dhruvy/Desktop/GlobalLandTemperaturesByCountry.csv')
by_country = by_country[~by_country['Country'].isin(
        ['Denmark', 'Antarctica', 'France', 'Europe', 'Netherlands','United Kingdom', 'Africa', 'South America'])]


by_country_fixed = by_country.replace(
   ['Denmark (Europe)', 'France (Europe)', 'Netherlands (Europe)', 'United Kingdom (Europe)'],
   ['Denmark', 'France', 'Netherlands', 'United Kingdom'])


countries = np.unique(by_country_fixed['Country'])
avg_temp = []
for country in countries:
    avg_temp.append(by_country_fixed[by_country_fixed['Country'] == 
                                               country]['AverageTemperature'].mean())

In [None]:
mean_temp_bar, countries_bar = zip(*sorted(zip(avg_temp, countries), reverse = True))

''' 
    A little word on zip. 
    Zip is a nifty little built-in python function that lets us sort a list using another list. 
    Try this code out and see what it does. 
    x = [1,2,3]
    y = [4,5,6]
    zipped = zip(x,y)
    print zipped
    x2,y2 = zip(*zipped)
    print x2==x
    print y2==y
    
    Can you figure out what the code above did now?
                                                                                                '''
    
    
    
sns.set(font_scale=0.9) 
f, ax = plt.subplots(figsize=(6, 50))
colors = sns.color_palette('coolwarm', len(countries))
sns.barplot(mean_temp_bar, countries_bar, palette = colors_cw[::-1])
titles = ax.set(xlabel='Average temperature', title='Average land temperature in countries')

### Advanced: Making Heatmaps

Heatmaps are a great tool to see patterns over a period of time, or making contours. I don't expect you to understand the entirety of this, but it would be cool to 

In [None]:
df = pd.read_csv('/home/dhruvy/Desktop/GlobalTemperatures.csv')
df = df[df["dt"] >= "1900-01-01"][["dt", "LandAverageTemperature",]]
df['dt'] = pd.to_datetime(df['dt'])
df.head(10)

In [None]:
import calendar

# Making new columns for year and month
df['year'] = df['dt'].map(lambda x: x.year)
df['month'] = df['dt'].map(lambda x: calendar.month_abbr[x.month])

# We are calculating the mean for any given month here
avg_temp_month = df[['month', 'LandAverageTemperature']].groupby('month').mean()

# .loc is a categorical, location based indexer
# We are using that to get the final average temperature 
df['finalavgtemp'] = df['month'].apply(lambda x: avg_temp_month.loc[x]) 

# Seeing if a month was higher or lower in average temperature
df['difference'] = df['LandAverageTemperature'] - df['finalavgtemp']
df.head()


In [None]:
final_df = df[['year', 'month', 'difference']].groupby(by=['year', 'month']).sum()

#Unstack: used for pivoting index tables (or reindexing them)
final_df = final_df.unstack()

# The .xs gives us a cross section, (rather than giving the old dataframe, it rearranges it how we want)
# In this case, i've arranged the data frame on the first axis (i.e. year)
final_df = final_df.xs("difference", axis=1)[::-1]



In [None]:
plt.figure(figsize=(15, 10))
plt.yticks(rotation = 'horizontal')
sns.heatmap(final_df, yticklabels=8, cmap="magma", vmin=-3)

## There you go!

Hope you had fun with visualisations. You can look at seaborn documentation and have a look at their examples to know more!