## Matplotlib - Other Chart Types

### Simple Bar Chart

The plt.bar function allows you to create simple bar charts to compare multiple categories of data.

You call plt.bar with two arguments:

- The x-values — a list of x-positions for each bar  
- The y-values — a list of heights for each bar

In [None]:

from matplotlib import pyplot as plt

drinks = ["cappuccino", "latte", "chai", "americano", "mocha", "espresso"]
sales =  [91, 76, 56, 66, 52, 27]
x_values = range(len(drinks))

plt.bar(x_values, sales)
plt.show()
        

### Simple Bar Chart II

By creating an aexs object we can customise the tick marks to be more meaningful than the range of numbers set as the x_values. 

-  Create an axes object using `ax = plt.subplot()`  


- Set the x-tick positions using a list of numbers: `ax.set_xticks([0, 1, 2, 3, 4, 5, 6, 7, 8])`, a simple way to do this if your x values are saved to a variable is to use `range(len(x_values_variable_name))`  


- Set the x-tick labels using a list of strings: `ax.set_xticklabels(['Mercury', 'Venus', 'Earth', 'Mars', 'Jupiter', 'Saturn', 'Uranus', 'Neptune', 'Pluto'])`  


- If your labels are particularly long, you can use the rotation keyword to rotate your labels by a specified number of degrees:
  ax.set_xticklabels(['Mercury', 'Venus', 'Earth', 'Mars', 'Jupiter', 'Saturn', 'Uranus', 'Neptune', 'Pluto'],
  rotation=30)

In [None]:

from matplotlib import pyplot as plt

drinks = ["cappuccino", "latte", "chai", "americano", "mocha", "espresso"]
sales =  [91, 76, 56, 66, 52, 27]
x_values = range(len(drinks))
plt.bar(x_values, sales)
plt.xlabel('Drink Type')
plt.ylabel('Sales')

ax = plt.subplot()
ax.set_xticks(x_values)
ax.set_xticklabels(drinks)


plt.show()

### Side-By-Side Bars

In order to create a side-by-side bar chart you will need 2 sets of data with the same tpe of axis values. This is useful to compare things suchs as:
- The populations of two countries over time
- Prices for different foods at two different restaurants
- Enrollments in different classes for males and females  

As the x-axis need to be offset to show the 2 sets of data side by side we can can assign a few things to variables then use a for loop to create lists which will act as the x-axis for each dataset. 

##### blue bars
```Python
n = 1   # This is our first dataset (out of 2)
t = 2   # Number of datasets
d = 7   # Number of sets of bars
w = 0.8 # Width of each bar
x_values1 = [t*element + w*n for element in range(d)]```  

##### orange bars
```Python
n = 2   # This is our second dataset (out of 2)
t = 2   # Number of datasets
d = 7   # Number of sets of bars
w = 0.8 # Width of each bar
x_values2 = [t*element + w*n for element in range(d)]```

In [None]:

from matplotlib import pyplot as plt

drinks = ["cappuccino", "latte", "chai", "americano", "mocha", "espresso"]
sales1 =  [91, 76, 56, 66, 52, 27]
sales2 = [65, 82, 36, 68, 38, 40]

# creating x values for the first dataset
n = 1           # This is our first dataset (out of 2)
t = 2           # Number of datasets
d = len(drinks) # Number of sets of bars
w = 0.8         # Width of each bar
store1_x = [t*element + w*n for element in range(d)]

# creating x values for the second dataset
n = 2           # This is our second dataset (out of 2)
t = 2           # Number of datasets
d = len(drinks) # Number of sets of bars
w = 0.8         # Width of each bar
store2_x = [t*element + w*n for element in range(d)]

# plotting the 2 datasets on a bar chart  
plt.bar(store1_x, sales1)
plt.bar(store2_x, sales2)

# setting the axes tick marks to that of dataset 1 and labelling them using the drinks variable
ax = plt.subplot()
ax.set_xticks(store1_x)
ax.set_xticklabels(drinks)

# labelling the axes
plt.xlabel('Drink Type')
plt.ylabel('Sale')

# creating a legend
plt.legend(['Location 1','Location 2'])

# displaying the bar chart 
plt.show()


#### Stacked Bars

If we want to compare two sets of data while preserving knowledge of the total between them, we can also stack the bars instead of putting them side by side.

We create the first set of bars as normal, but for the second plt.bar() we add bottom = 'dataset_1' this way the second dataset starts when the first dataset finishes. 

E.g.
```Python
# dataset 1:  
video_game_hours = [1, 2, 2, 1, 2]
plt.bar(range(len(video_game_hours)), video_game_hours) 

# dataset 2:
book_hours = [2, 3, 4, 2, 1]
plt.bar(range(len(book_hours)), book_hours, bottom=video_game_hours)```

In [None]:

from matplotlib import pyplot as plt

drinks = ["cappuccino", "latte", "chai", "americano", "mocha", "espresso"]
sales1 =  [91, 76, 56, 66, 52, 27]
sales2 = [65, 82, 36, 68, 38, 40]
# creating x_values based on the range of the length of the drinks variable
x_values = range(len(drinks))

# plotting sales 1 on the bar chart
plt.bar(x_values, sales1)
# plotting sales 2 as stacked bar, using the sales1 as the starting point 
plt.bar(x_values, sales2, bottom = sales1)

# adding title and axis labels
plt.title('Total Drinks Sold Between Location 1 and 2')
plt.xlabel('Drink Type')
plt.ylabel('Sales')

# renaming the tickmarks to the drink names
ax = plt.subplot()
ax.set_xticklabels(drinks)

# adding a legend
plt.legend(['Location 1','Location 2'])

plt.show()

### Error Bars

Sometimes you want to show an error range of a datapoint, for example:  
    
- The average number of students in a 3rd grade classroom is 30, but some classes have as few as 18 and others have as many as 35 students.  

- We measured that the weight of a certain fruit was 35g, but we know that our scale isn’t very precise, so the true weight of the fruit might be as much as 40g or as little as 30g.  

- The average price of a soda is £1.00, but we also want to communicate that the standard deviation is 20p.  


To acheive this you can specify a y axis error (yerr) after entering the x and y values into a plt.bar() argument, this can be followed by capsize = '#' to determine the size of the lines which cap off the error lines.  


base syntax:
```Python
values = [10, 13, 11, 15, 20]
yerr = [1, 3, 0.5, 2, 4]
plt.bar(range(len(values)), values, yerr=yerr, capsize=10)
plt.show()```

In [None]:

from matplotlib import pyplot as plt

drinks = ["cappuccino", "latte", "chai", "americano", "mocha", "espresso"]
ounces_of_milk = [6, 9, 4, 0, 9, 0]
error = [0.6, 0.9, 0.4, 0, 0.9, 0]
x_range = range(len(drinks))

plt.bar(x_range, ounces_of_milk, yerr=error, capsize = 5)

plt.show()


### Fill Between - Line Graph Error Range

Like adding error lines to a bar chart, you can create a shaded area onto a line graph to show error margin. This can be done with:
- `plt.fill_between(x_values, y_lower, y_upper, alpha=#)`
- It uses the same x_values as the line graph
- It takes 2 additional lists y_lower then y_upper 
- The alpha should be a number between 0 and 1 to specify the transparency of the shading. 

y_lower and y_upper can be calculated using list comprehension to make it easier. For example 
- `y_lower = [i - 2 for i in y_values]` would minus 2 from each element in your y_values
- `y_upper = [i + 2 for i in y_values]` would add 2 from each element in your y_values

In [None]:

from matplotlib import pyplot as plt

months = range(12)
month_names = ["Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"]
revenue = [16000, 14000, 17500, 19500, 21500, 21500, 22000, 23000, 20000, 19500, 18000, 16500]
# setting y_lower to 10 percent less than the revenue value
y_lower = [num * 0.9 for num in revenue]
# setting y_upper to 10 percent higher than revenue value
y_upper = [num * 1.1 for num in revenue]

ax = plt.subplot()
plt.fill_between(months, y_lower, y_upper, alpha=0.2)
plt.plot(months, revenue)
ax.set_xticks(months)
ax.set_xticklabels(month_names)

plt.show()

### Pie Charts

Pie charts are pretty straightforward to create, you pass the values you wish to see through `plt.pie()`.  

Example Syntax:
``` Python
budget_data = [500, 1000, 750, 300, 100]

plt.pie(budget_data)
plt.axis('equal') # Equal aspect ratio ensures that pie is drawn as a circle, and not squashed
plt.show()
```

In [None]:
from matplotlib import pyplot as plt
import numpy as np

payment_method_names = ["Card Swipe", "Cash", "Apple Pay", "Other"]
payment_method_freqs = [270, 77, 32, 11]

plt.pie(payment_method_freqs)
plt.axis('equal')
plt.legend(payment_method_names)

plt.show()


### Pie Chart labelling

you can show the names of each part of a pie chart by adding a legend in the usual way, -`plt.legend(value_names)`- however you can also make it clearer by assigning the value names next to each slice of the pie chart by passing the labels through the plt.pie(), e.g:
```Python 
plt.pie(budget_data, labels=budget_categories)
```  

We can also add the percentage of the total that each slice represents, Matplotlib can add this automatically with the keyword autopct. We pass in string formatting instructions to format the labels how we want. Some common formats are:

- `'%0.2f'` - 2 decimal places, like 4.08
- `'%0.2f%%'` - 2 decimal places, but with a percent sign at the end, like 4.08%. You need two consecutive percent signs because the first one acts as an escape character, so that the second one gets displayed on the chart.
- `'%d%%'` - rounded to the nearest int and with a percent sign at the end, like 4%

So, a full call to plt.pie might look like:
```Python
plt.pie(budget_data,
        labels=budget_categories,
        autopct='%0.1f%%')
```    

In [None]:
from matplotlib import pyplot as plt

payment_method_names = ["Card Swipe", "Cash", "Apple Pay", "Other"]
payment_method_freqs = [270, 77, 32, 11]

plt.figure(figsize=(6,6))
plt.pie(payment_method_freqs, labels=payment_method_names, autopct='%0.1f%%')
plt.axis('equal')
plt.legend(payment_method_names)

plt.show()

### Histograms


#### What is a histogram?

Sometimes we want to get a feel for a large dataset with many samples beyond knowing just the basic metrics of mean, median, or standard deviation. To get more of an intuitive sense for a dataset, we can use a histogram to display all the values.

A histogram tells us how many values in a dataset fall between different sets of numbers (i.e., how many numbers fall between 0 and 10? Between 10 and 20? Between 20 and 30?). Each of these questions represents a bin, for instance, our first bin might be between 0 and 10.

All bins in a histogram are always the same size. The width of each bin is the distance between the minimum and maximum values of each bin. In our example, the width of each bin would be 10.

Each bin is represented by a different rectangle whose height is the number of elements from the dataset that fall within that bin.

#### Making Histograms with Matplotlib

To create a histogram simply use `plt.hist(my_list)` with a list as the only argument, this will split the data into 10 bins automatically and display the whole range. 

If you want more than 10 bins you can specify this like so:
```Python 
plt.hist(my_list, bins=40)```

And if you wanted to look at a specific range and more than 10 bins you can do so like this:
```Python 
plt.hist(dataset, range=(66,69), bins=40)```

In [None]:
## importing 2 dataframes to use for the histogram exercise

import pandas as pd

## replace with the locations you have saved the code to 
test_data_sales1 = pd.read_csv(r"C:\Users\andrew.morris\Documents\GitHub\murry_code\Codecademy Lesson Notes\test data\06b_test_data_histograms_sales1.csv") 
test_data_sales2 = pd.read_csv(r"C:\Users\andrew.morris\Documents\GitHub\murry_code\Codecademy Lesson Notes\test data\06b_test_data_histograms_sales2.csv") 

print(test_data_sales1.head())
print(test_data_sales2.head())

In [None]:
from matplotlib import pyplot as plt
import pandas as pd

sales_times = test_data_sales1.time_as_number
# print(sales_times.head())

plt.hist(sales_times, bins=20)

plt.show()

### Multiple Histograms

You can show multiple histograms on the same plot, adding the keyword `alpha=#` allows you to set a transparancy level using a number between 0 and 1

If you wanted to show these as lines rather than filled bars you can add the keyword/value `histtype='step'`

If the 2 datasets have 2 different size values you can add normed=True

In [None]:
sales_times1 = test_data_sales1.time_as_number
sales_times2 = test_data_sales2.time_as_number

plt.hist(sales_times1, bins=20, alpha=0.4, normed=True)
plt.hist(sales_times2, bins=20, alpha=0.4, normed=True)

plt.show()

# print(sales_times1)