### LSE Data Analytics Online Career Accelerator

# DA201: Data Analytics Using Python

## Practical activity: Customise your plots

**This is the solution to the activity.**

Canopy is a new boutique streaming company that is looking to create an app that provides recommendations based on the last movie you watched. As a part of their process, they want to visualise and understand the data before they start making suggestions and recommendations to their clients. 

This analysis uses the `movies.csv` and `ott.xlsx` data sets. Based on the available information, in this activity you will:

- customise the existing countplot with counts and a histogram with a line.

## 1. Import the libraries

In [None]:
# Import necessary libraries.
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

## 2. Import Excel file

In [None]:
# Load the Excel data using pd.read_excel.
ott = pd.read_excel('ott.xlsx')

# View the columns.
print(ott.columns)

## 3. Import CSV file

In [None]:
# Load the CSV data using pd.read_csv.
movies = pd.read_csv('movies.csv')

print(movies.columns)

## 4. Validate the DataFrames

In [None]:
# Data imported correctly?
print(movies.head())
print(movies.shape)
print(movies.dtypes)

In [None]:
# Data imported correctly?
print(ott.head())
print(ott.dtypes)
print(ott.shape)

## 5. Combine the two DataFrames

In [None]:
# Merge the two DataFrames.
mov_ott = pd.merge(movies, ott, how='left', on = 'ID')

# View the DataFrame.
print(mov_ott.shape)
mov_ott.head()

## 6. Create a countplot

In [None]:
# Create a countplot based on number of movies streamed by Netflix per age group.
sns.countplot(x='Age',
              hue='Netflix',
              data=mov_ott)

## 7. Create a histogram

In [None]:
# Create a histogram based IMDb. 
sns.histplot(data=mov_ott,
             x='IMDb',
             binwidth=1)

## 8. Create a scatterplot

In [None]:
# Create scatterplot with two variables (IMDb and Rotten Tomatoes).
sns.scatterplot(x='IMDb',
                y='Rotten Tomatoes',
                data=mov_ott)

## 9. Create a lineplot

In [None]:
# Create a simple lineplot.
sns.lineplot(x='Year',
             y='IMDb',
             data=mov_ott)

In [None]:
# Create a simple lineplot.
sns.lineplot(x='Year',
             y='IMDb',
             data=mov_ott,
             ci=None)

In [None]:
# Create lineplots with specification.
sns.lineplot(x = 'Year',
             y = 'IMDb',
             data=mov_ott[mov_ott['Age'].isin(['16+', '18+'])],
             hue ='Age')

In [None]:
# Create lineplots with specification.
sns.lineplot(x = 'Year',
             y = 'IMDb',
             data=mov_ott[mov_ott['Age'].isin(['16+', '18+'])],
             hue ='Age',
             ci=None)

## 10. Customise plots

### Barplot

In [None]:
mov_ott_2010 = mov_ott[mov_ott['Year'] >= 2010]

ax = sns.countplot(x='Year',
                   data=mov_ott_2010)

ax.set(ylabel='Percent')

total = len(mov_ott_2010['Year'])

for p in ax.patches:
    percentage = '{:.1f}%'.format(100 * p.get_height()/total)
    x = p.get_x()
    y = p.get_y() + p.get_height()
    ax.annotate(percentage, (x, y))

plt.xticks(rotation=90)
plt.show()

### Histogram

In [None]:
ax = sns.displot(data=mov_ott,
                 x='IMDb',
                 bins=10,
                 kind='hist', 
                 palette='GnBu',
                 aspect=1.4,
                 kde=True)

plt.show()