# Real World Examples - Scatter, Bar & Line Charts - Solution
Keith Galli task - Line graphs based on video https://www.youtube.com/watch?v=0P7QnIQDBJY&feature=youtu.be  
Scatter, Bar, Line Graphs - other sources used throughout

THIS VERSION OF THE NOTEBOOK REFERENCES THE DATAFRAME **df**

#### Load the Libraries

In [None]:
#Load Necessary Libraries
import numpy as np
import pandas as pd

#This is a library for creating graphs - 
    #sometimes additional libraries are also needed
    #& matplotlib is not the only option for creating graphs
import matplotlib.pyplot as plt


## Cleaning the Data

Before creating any of the charts below, you should check the datasets for null values.

#### Cleaning the Gas Prices Dataset

In [None]:
#Read in the df_prices file
df = pd.read_csv('gas_prices.csv')

#Examine the DataFrame - it provides df prices for 10 countries over an 18 year period
df

In [None]:
#get the shape of the dataframe. It is quite small
df.shape

In [None]:
#count the number of null values in the dataframe
df.isna().sum()

In [None]:
#get the precise location of the null value - in the Australia column
df[df['Australia'].isnull()]

In [None]:
#As this is the first row in our dataframe - this simplest thing might be to delete the row
df = df.dropna(how='any', subset=['Australia'])
df['Australia'].isna().sum()

In [None]:
#Get the shape of the DataFrame again
df.shape

## Creating Charts  


### Line Graph showing Prices

#### Display the Dataset

In [None]:
#Examine the dataframe 
df

#### Very Basic Line Graph

In [None]:
#This code will extract the year and the price for the USA from the dataframe
plt.plot(df.Year, df.USA)
#This code will extract the year and the price for Canada from the dataframe
plt.plot(df.Year, df.Canada)
#This code will extract the year and the price for Australia from the dataframe
plt.plot(df.Year, df.Australia)

#If your column names have more than one word you will have to use the following format:
plt.plot(df['Year'], df['South Korea'])

#This code with show the chart 
plt.show()

#### More Detailed Line Graph
  
The graph above provides a very poor visual representation of the data. It has no title, and there are no labels on the axis. There is no legend to tell us which line is which. The x axis values are hard to read. We will address these issues below:


In [None]:
#You can control the size of your chart
plt.figure(figsize=(8,5))

#ou can add a title with formatting using a font dictionary
plt.title('df Prices over Time (in USD)', fontdict={'fontweight':'bold', 'fontsize': 18})

#This code uses short hand notation to format the style and colour of the lines in the chart
#We can change the labels on our individual lines if we want to make them more meaningful than the default column names
plt.plot(df.Year, df.USA, 'b.-', label='United States')
plt.plot(df.Year, df.Canada, 'r.-', label='Canada')
plt.plot(df.Year, df['South Korea'], 'g.-', label='South Korea')
plt.plot(df.Year, df.Australia, 'y.-',label='Australia' )

#This code controls how the values on the x are displayed - the ticks indicate 3 year intervals
plt.xticks(df.Year[::3].tolist())

#Although our records stop at 2008 - we might want more space at the edge of our chart - so we can add another tick interval
#plt.xticks(df.Year[::3].tolist()+[2009])

#Add labels to the x and y axis
plt.xlabel('Year')
plt.ylabel('US Dollars')

#Add a legend to the chart - this may not work if you have not added labels to the lines (Like we did above)
plt.legend()

#Save the chart as a separate image
#plt.savefig('Gas_price_figure.png', dpi=300)

#display the chart within the notebook
plt.show()

####  An Alternative Way to Display Many Values in a Line Chart
Look at the FOR loop section

In [None]:
#This code - produces the same chart as the code above.
#You may find the code above easier to understand.

#You can control the size of your chart
plt.figure(figsize=(8,5))

#You can add a title with formatting by using a font dictionary
plt.title('Gas Prices over Time (in USD)', fontdict={'fontweight':'bold', 'fontsize': 18})

# Another Way to plot many values!
countries_to_look_at = ['Australia', 'USA', 'Canada', 'South Korea']
counter =0
for country in df:                       #For each item in the dataframe
    if country in countries_to_look_at:   #If the item is in the dataframe appears in the list of countries_to_look_at
        counter+=1
        plt.plot(df.Year, df[country], marker='.', label=country)   #Plot the line - Set the label equal to country

        
#This code controls how the values on the x are displayed - the ticks indicate 3 year intervals
#Although our records stop at 2008 - we might want more space at the edge of our chart - so we can add another tick interval
plt.xticks(df.Year[::3].tolist()+[2009])

#Add labels to the x and y axis
plt.xlabel('Year')
plt.ylabel('US Dollars')

#Add a legend to the chart - this may not work if you have not added labels to the lines (Like we did above)
plt.legend()

#Save the chart as a separate image
#plt.savefig('Gas_price_figure.png', dpi=300)

#display the chart within the notebook
plt.show()

## TASKS

### 1 Create a Detailed Line Graph  to Show the Gas Prices for Germany, Italy,  Mexico, UK and Japan
   
* Make the size of the chart 16 * 8 
* Provide a meaningful title at font size 16
* Use 2 year intervals
* Use a Circle marker - google 'matplotlib markers' or check out this link: https://matplotlib.org/stable/api/markers_api.html
* Rotate the labels in the x axis 45 degrees - google 'matplotlib rotate x axis labels' or check out this link: https://www.geeksforgeeks.org/how-to-rotate-x-axis-tick-label-text-in-matplotlib/

In [None]:
##Detailed Line Chart

#You can control the size of your chart
plt.figure(figsize=(16,8))

#You can add a title with formatting by using a font dictionary
plt.title('Gas Prices over Time (in USD)', fontdict={'fontweight':'bold', 'fontsize': 18})

# Another Way to plot many values!
countries_to_look_at = ['Germany', 'Italy', 'Mexico', 'UK', 'Japan']
counter =0
for country in df:                       #For each item in the dataframe
    if country in countries_to_look_at:   #If the item is in the dataframe appears in the list of countries_to_look_at
        counter+=1
        plt.plot(df.Year, df[country], marker='o', label=country)   #Plot the line - Set the label equal to country

        
#This code controls how the values on the x are displayed - the ticks indicate 3 year intervals
#Although our records stop at 2008 - we might want more space at the edge of our chart - so we can add another tick interval
plt.xticks(df.Year[::2].tolist()+[2009], rotation=45)

#Add labels to the x and y axis
plt.xlabel('Year')
plt.ylabel('US Dollars')

#Add a legend to the chart - this may not work if you have not added labels to the lines (Like we did above)
plt.legend()

#Save the chart as a separate image
#plt.savefig('Gas_price_figure.png', dpi=300)

#display the chart within the notebook
plt.show()

### 2 Create a Basic Scatterplot

This link will help: https://www.earthdatascience.org/courses/scientists-guide-to-plotting-data-in-python/plot-with-matplotlib/introduction-to-matplotlib-plots/customize-plot-colors-labels-matplotlib/    

* Make the size of the chart 16 * 6 
* Show the Gas Prices for Germany - Look at the section: **Create Different Types of Matplotlib Plots: Scatter and Bar Plots**
Test that your chart works before doing the the next steps
* Provide a meaningful title, and labels on the x and y axis - Look at the section: **Customize Plot Title and Axes Labels**
* Use a pixel marker, a color cyan, an edgecolor darkblue, and an alpha value of 0.3 - Look at the section: **Custom Markers in Line and Scatter Plots**
* Use 1 year intervals - Look at previous example
* Rotate the labels in the x axis 45 degrees - Look at previous example

In [None]:
#Basic Chart

# Define plot space
fig, ax = plt.subplots(figsize=(16, 6))

# Define x and y axes
ax.scatter(df.Year, 
        df.Germany,
        #marker = ',',
        color='cyan',
        #edgecolor='darkblue',
        alpha=0.3)

ax.scatter(df.Year, 
        df.Italy,
        #marker = ',',
        color='yellow',
        #edgecolor='darkblue',
        alpha=0.3)

# Set plot title and axes labels
ax.set(title = "Gas Prices Over Time in USD",
       xlabel = "Year",
       ylabel = "Price\n(USD)")

#format the ticks to 1 year intervals
plt.xticks(df.Year[::1].tolist()+[2009], rotation=45)

#display the chart within the notebook
plt.show()

Modify the chart above so that it is a Bar Plot (change ax.scatter to ax.bar) - you will get an error, because a Bar Plot does not use a marker. Comment out that code.

Modify the chart above so that it is a Line Plot (change ax.bar to ax.plot) - you will be able to include the marker value again, but you will get an error because a Line Plot does not use an edgecolor. Comment out that code.

Change your chart back to a scatterplot 
* Add another country - Italy and set the color to yellow 
* View the data as bar plot  
* View the data as a line plot

### Using Calculations in the Charts

In the Scatter Plot, Line Chart and Bar Chart example, directly above, we were plotting the results of one column (year) against another (prices for a country).

More times than not we want to do more than simply extract data from the dataset. Quite often we want to perform calculations on the data (some basic, some quite complex) and then represent the results of those calculations in graphical format. 

Before creating the following charts calculate the mean price of gas for Germany, for Australia, for Canada and for the USA, rounding each of the averages to 1 decimal place.

In [None]:
#Calculating average Gas Price in each of the Countries - you could also do max, min, median
avgG = round(df['Germany'].mean(), 1)
avgA = round(df['Australia'].mean(), 1)
avgC = round(df['Canada'].mean(), 1)
avgU = round(df['USA'].mean(), 1)

In [None]:
#Assign your results to a new list
averages = [avgG, avgA, avgC, avgU]
averages

In [None]:
#Assign the countries to a list
countries = ['Germany', 'Australia', 'Canada', 'USA']
countries

### 3 Create a Basic Bar Chart with Calculations
Ensure that the code in the 3 cells above has been executed first  

Again, this link, or the code above, will help: https://www.earthdatascience.org/courses/scientists-guide-to-plotting-data-in-python/plot-with-matplotlib/introduction-to-matplotlib-plots/customize-plot-colors-labels-matplotlib/

In [None]:
#Basic Chart

# Define plot space
fig, ax = plt.subplots(figsize=(16, 6))

# Define x and y axes
ax.bar(countries, 
        averages,
        #marker = ',',
        color='cyan',
        #edgecolor='darkblue',
        alpha=0.3)

# Set plot title and axes labels
ax.set(title = "Average Gas Prices Between 1991 and 2008 in USD",
       xlabel = "Country",
       ylabel = "Price\n(USD)")

plt.xticks(countries, rotation=45)

#If you want to extend the ticks try either of the following methods
#Manually specifying the ticks
#ax.set_yticks([0,1,2,3,4,5])

#Or
#extend the existing values you are using
plt.ylim([0, max(averages)+1])

plt.show()

As you know, in programming there are a variety of ways of doing the same things. Below, there is some slightly different code for a bar chart obtained from the following link  
https://pythonspot.com/matplotlib-bar-chart/

In [None]:
#set the chart size
plt.figure(figsize=(10,8), dpi=100)

#create the bars
plt.bar(countries, averages, alpha=0.5, color='pink')
#set the ticks
plt.xticks(countries)
#set the labels
plt.xlabel('Country')
plt.ylabel('Price')
#set the title
plt.title('Average Gas Prices Between 1991 and 2008 in USD')


plt.show()

Change the chart directly above to a scatter plot and then a line plot. If there are error messages address them.

### 4 Annotating a Bar Chart  - Reproduce the Bar Chart Above

Annotations on charts need the **seaborn** library and an additional matplotlib library.

Use the following link to get code for a bar chart that has annotations. 

https://www.geeksforgeeks.org/how-to-annotate-bars-in-barplot-with-matplotlib-in-python/
  

In [None]:
#It is good practise to keep all of the libraries that you need at the top of your Notebook, 
  #but for this demonstration we will keep them here
import seaborn as sns
import matplotlib.pyplot as plt

In [None]:
# Creating our own dataframe
data = {"Countries": countries,
        "Averages": averages}
 
# Now convert this dictionary type data into a pandas dataframe
# specifying what are the column names
temp_df = pd.DataFrame(data, columns=['Countries', 'Averages'])
 
 
# Defining the plot size
plt.figure(figsize=(8, 8))
 
# Defining the values for x-axis, y-axis
# and from which dataframe the values are to be picked
plots = sns.barplot(x="Countries", y="Averages", data=temp_df)
 
# Iterating over the bars one-by-one
for bar in plots.patches:
   
  # Using Matplotlib's annotate function and
  # passing the coordinates where the annotation shall be done
  # x-coordinate: bar.get_x() + bar.get_width() / 2
  # y-coordinate: bar.get_height()
  # free space to be left to make graph pleasing: (0, 8)
  # ha and va stand for the horizontal and vertical alignment
  plots.annotate(format(bar.get_height(), '.2f'), 
                   (bar.get_x() + bar.get_width() / 2, 
                    bar.get_height()), ha='center', va='center',
                   size=15, xytext=(0, 8),
                   textcoords='offset points')
 
# Setting the label for x-axis
plt.xlabel("Countries", size=14)
 
# Setting the label for y-axis
plt.ylabel("Averages", size=14)
 
# Setting the title for the graph
plt.title("This is an annotated barplot")
 
# Finally showing the plot
plt.show()

### 5 Extra Example Using Annotated Bar Chart: Bar Chart showing Gas Prices in 1995

In [None]:
#Filter the record with the prices for 1995 - the results are in a DataFrame
pricesList = df.loc[df['Year'] == 1995]

#Convert the results to a list
pricesList = pricesList.values.tolist()[0]

#Re-save the list without the Year
pricesList = pricesList[1:]
pricesList

In [None]:
#Get the DataFrames columns and convert the results to a list
countriesList = df.columns.values.tolist()

#Re-save the list without the Year
countriesList = countriesList[1:]
countriesList

In [None]:
#Create the Bar Chart

# Defining the plot size
plt.figure(figsize=(12, 8))


# Creating our own dataframe
data = {"CountriesList": countriesList,
        "PricesList": pricesList}
 
# Now convert this dictionary type data into a pandas dataframe
# specifying what are the column names
temp_df = pd.DataFrame(data, columns=['CountriesList', 'PricesList'])
 
 
# Defining the values for x-axis, y-axis
# and from which dataframe the values are to be picked
plots = sns.barplot(x="CountriesList", y="PricesList", data=temp_df)

# Iterrating over the bars one-by-one to annotate
for bar in plots.patches:
  # Using Matplotlib's annotate function and passing the coordinates where the annotation shall be done
  # x-coordinate: bar.get_x() + bar.get_width() / 2
  # y-coordinate: bar.get_height()
  # free space to be left to make graph pleasing: (0, 8)
  # ha and va stand for the horizontal and vertical alignment
    plots.annotate(format(bar.get_height(), '.2f'), (bar.get_x() + bar.get_width() / 2, bar.get_height()), ha = 'center', va = 'center', 
                   size=15,xytext = (0, 8), 
                   textcoords = 'offset points')

# Setting the label for x-axis    
plt.xlabel("Countries", size=14)
# Setting the label for y-axis
plt.ylabel("Prices\nUSD", size=14)
# Setting the title for the graph
plt.title("Gas Prices 1995")
# Fianlly showing the plot
plt.show()