# 1 Introduction
In this notebook you will learn how to: 
1. Modify seaborn plots
2. Visualize on the same plot, graphs for different groups of data
3. Filter data

# 2 Import Packages
As usual, we load the required packages for this notebook

In [None]:
import pandas as pd               # for data manipulation
import matplotlib.pyplot as plt   # for plotting 
import seaborn as sns             # an extension of matplotlib for statistical graphics

# 3 Import Dataset
We load the data that we will work on.

In [None]:
orders = pd.read_csv('../input/orders.csv' )
orders.head()

# 4 Solution on previous' week assignment

Use the above analysis to find out which day of the week has the most orders. To answer this question, you will need to use the **order_dow** column of orders DataFrame.

In [None]:
sns.countplot(x='order_dow', data=orders )

Which day has the most orders and which the fewest?

As you can see the produced plot is small and it is difficult to be interpreted. Try now to:
1. Create the same plot with dimensions 10x10
2. Use a color of your desire for the bars - the name of all available colors can be found here: [Available colors on Seaborn](https://python-graph-gallery.com/100-calling-a-color-with-seaborn/)
3. Add a proper title to the axes and the plot

In [None]:
plt.figure(figsize=(10,10))
sns.countplot(x="order_dow", data=orders, color='red')
plt.ylabel('Orders', fontsize=10)
plt.xlabel('Day of the Week', fontsize=10)
plt.title("Orders per Day", fontsize=15)
plt.show()

Now create a DataFrame that keeps only the first order of each customer

In [None]:
orders_first = orders[orders['order_number'] == 1]
orders_first.head(20)

And a DataFrame that keeps only the second order of each customer

In [None]:
orders_second = orders.loc[orders['order_number'] == 2]
orders_second.head(20)

So to create two subplots of **order_dow** for first and second orders.

In [None]:
#create a subplot which contains two plots; one down the other
fig, axes = plt.subplots(nrows=2, ncols=1, figsize=(15,8))

#assign each plot to the appropiate axes
sns.countplot(ax= axes[0], x='order_dow', data=orders_first, color='red')
sns.countplot(ax= axes[1], x='order_dow', data=orders_second, color='red')

# produce the final plot
plt.show()

# 5  Modifying the aesthetics of Seaborn plots
## 5.1 Manually edit the ticks on x-axis of a plot
 Recall the plot that we have produced on week 4 notebook:

In [None]:
plt.figure(figsize=(15,5))
sns.countplot(x='order_number', data=orders, color='red')
plt.ylabel('Total Customers', fontsize=10)
plt.xlabel('Total Orders', fontsize=10)
plt.title('How many orders do customers make?')
plt.show()

Now we show a way to modify manually the x-ticks (the overlapping numbers). Towards this end we:
* Assign the produced plot in a variable (in our case we name it 'graph')
* Use the method .set( ) to set aesthetic parameters in one step [aesthetics definition; [ref.1 ](https://www.interaction-design.org/literature/book/the-encyclopedia-of-human-computer-interaction-2nd-ed/visual-aesthetics), [ref.2](http://www.visual-arts-cork.com/definitions/aesthetics.htm)]


Note that this procedure is used on seaborn graphs.

In [None]:
plt.figure(figsize=(15,5))
graph=sns.countplot(x='order_number', data=orders, color='red')
graph.set(xticks=[25,50,75,100], xticklabels=['25 orders','50 orders', '75 orders', '100 orders'] )
plt.ylabel('Total Customers', fontsize=10)
plt.xlabel('Total Orders', fontsize=10)
plt.title('How many orders do customers make?')
plt.show()

Have a look on arguments xticks & xticklabels of .set( ) method:
* xticks=[25,50,75,100] indicates which ticks to select
* xticklabels=['25 orders','50 orders', '75 orders', '100 orders'] ) indicates what labels to use on each tick

While xticks must match the corresponding labels of dependented value (x='order_number') , the xticklabels can have any name

### 5.1.1 Create a sequence for x-ticks; the use of built-in function range( ) 
Now we show how we can create a sequence of numbers to use on x-ticks. To create a sequence of numbers we use the built-in fuction range( ) of Python. The range( ) function consists of three arguments: the starting number, the ending number and the step.

In [None]:
rg = list(range(0,101,10))
rg

So in the above results we request a sequence starting from 0, ending to 101, with step 10. To retrieve the results of a range function we need to pass it to the list( ) function.

Now we use the above command as argument for both xticks & xticklabels on the produced plot above.

In [None]:
plt.figure(figsize=(15,5))
graph=sns.countplot(x='order_number', data=orders, color='red')
graph.set( xticks=rg, xticklabels=list( range(0,101,10) ) )
plt.ylabel('Total Customers', fontsize=10)
plt.xlabel('Total Orders', fontsize=10)
plt.title('How many orders do customers make?')
plt.show()

### 5.1.2 Rotate the x-ticks
Now we show how we can rotate the x-ticks through matplotlib library; we use xticks method and rotation='vertical' argument.

In [None]:
plt.figure(figsize=(15,5))
graph=sns.countplot(x='order_number', data=orders, color='red')
graph.set( xticks=list( range(0,101,10) ), xticklabels=list( range(0,101,10) ) )
plt.ylabel('Total Customers', fontsize=10)
plt.xlabel('Total Orders', fontsize=10)
plt.title('How many orders do customers make?')
plt.xticks(rotation='vertical')
plt.show()

## 5.2 Modify the color palette
Now we show how we can use different combination of colours (palettes) to visualize each bar.
To do this, instead of color argument on countplot function, we use the argument **palette= **<br>
Then we select an available color palette from seaborn;  [ref.1 ](https://python-graph-gallery.com/101-make-a-color-palette-with-seaborn/), [ref.2](https://seaborn.pydata.org/tutorial/color_palettes.html)


In [None]:
plt.figure(figsize=(15,5))
graph=sns.countplot(x='order_number', data=orders, palette='Set1',saturation=1)
graph.set( xticks=list( range(0,101,10) ), xticklabels=list( range(0,101,10) ) )
plt.ylabel('Total Customers', fontsize=10)
plt.xlabel('Total Orders', fontsize=10)
plt.title('How many orders do customers make?')
plt.xticks(rotation='vertical')
plt.show()

## 5.3 Add grid to plots
Now we add a grid to the plot via matplotlib. For this reason we use the method plt.grid(True)

In [None]:
plt.figure(figsize=(15,5))
graph=sns.countplot(x='order_number', data=orders, palette='Reds')
graph.set( xticks=list( range(0,101,10) ), xticklabels=list( range(0,101,10) ) )
plt.ylabel('Total Customers', fontsize=10)
plt.xlabel('Total Orders', fontsize=10)
plt.title('How many orders do customers make?')
plt.xticks(rotation='vertical')
plt.grid(True)
plt.show()

## 5.4 Zoom-in to specific area of the plot
Now we want to know with more detail how many customers placed more than 50 orders. To achieve this we will use the method xlim and ylim of matplotlib

In [None]:
plt.figure(figsize=(15,5))
graph=sns.countplot(x='order_number', data=orders, palette='Reds')
graph.set( xticks=list( range(0,101,10) ), xticklabels=list( range(0,101,10) ) )
plt.ylabel('Total Customers', fontsize=10)
plt.xlabel('Total Orders', fontsize=10)
plt.title('How many orders do customers make?')
plt.xticks(rotation='vertical')
plt.xlim(50,100)
plt.ylim(0,10000)
plt.grid(True)
plt.show()

With the use of xlim & ylim we requested to limit out plot on x-axis between values 50 and 100 and on y-axis between 0 and 10000

## 5.5 Your turn 📚📝
Modify the plot of frequency of orders by hour of day. You need to:
1. Zoom-in to hours 0 to 6. You need to add a greater range to properly visualize each bar. <br>(Hint: you can use negative values and also half prices
2. Use the proper range of y-axis
3. Add a grid to the plot
4. Use a colour palette instead of a single colour.



In [None]:
#modify the existing code
plt.figure(figsize=(15,5))
sns.countplot(x="order_hour_of_day", data=orders, palette="Reds")
plt.ylabel('Total Orders', fontsize=10)
plt.xlabel('Hour of day', fontsize=10)
plt.title("Frequency of order by hour of day", fontsize=15)
plt.xlim(-0.5,6.5)
plt.ylim(0,35000)
plt.grid(True)
plt.show()

# 6 Create a countplot with groups
Recall the plot that we have created for the order_hour_of_day:

In [None]:
plt.figure(figsize=(15,5))
sns.countplot(x="order_hour_of_day", data=orders, color='red')
plt.ylabel('Total Orders', fontsize=10)
plt.xlabel('Hour of day', fontsize=10)
plt.title("Frequency of order by hour of day", fontsize=15)
plt.show()

Now we will visualise the distributions of different days ('order_dow') and different hours (order_hour_of_day) on the same plot.

Towards this end, we use the argument **hue**, which splits a variable based on an other variable. I our case we spit **order_hour_of_day** using the **order_dow** variable. 

In [None]:
plt.figure(figsize=(15,5))
sns.countplot(x="order_hour_of_day", data=orders, color='red',  hue='order_dow')
plt.ylabel('Total Orders', fontsize=10)
plt.xlabel('Hour of day', fontsize=10)
plt.title("Frequency of order by hour of day", fontsize=15)
plt.show()

What we get here is a plot that describes for each day the orders placed for each hour.
Below we modify the aesthetics of the plot and we use a colour palette instead of a single colour. 

In [None]:
plt.figure(figsize=(15,5))
sns.countplot(x="order_hour_of_day", data=orders, palette='RdBu', hue='order_dow')
plt.ylabel('Total Orders', fontsize=10)
plt.xlabel('Hour of day', fontsize=10)
plt.title("Frequency of order by hour of day", fontsize=15)
plt.show()

# 7 Advanced Filtering
In the following examples we show how we can filter our data based on specific criteria.
## 7.1 Select the first four orders from each customer
The column order_number keeps the sequence of each order for every customer. 

In [None]:
first_four_orders = orders[orders.order_number<=4]
first_four_orders.head()

Now that we have the first four orders from each customer we will produce countplots for the day and the hour of the day for each order_number:

In [None]:
plt.figure(figsize=(15,5))
sns.countplot(x="order_dow", data=first_four_orders, palette='RdBu', hue='order_number')
plt.ylabel('Total Orders', fontsize=10)
plt.xlabel('Day of the week', fontsize=10)
plt.title("Frequency of order by day of the week", fontsize=15)
plt.show()

In [None]:
plt.figure(figsize=(15,5))
sns.countplot(x="order_hour_of_day", data=first_four_orders, palette='RdBu', hue='order_number')
plt.ylabel('Total Orders', fontsize=10)
plt.xlabel('Hour of day', fontsize=10)
plt.title("Frequency of order by hour of day", fontsize=15)
plt.show()

And a pandas boxplot for the **days_since_prior_order** for each order_number

In [None]:
first_four_orders.boxplot(by='order_number', column=['days_since_prior_order'], figsize=(10,5))

## 7.2 Select the orders from users with more 10 orders
To select the orders from users with at least 10 orders, first we need to create a Series with the user_ids that have at least 10 orders. In this case we select to keep all these rows which have order_number equal to 11 (more than 10 orders).

In [None]:
eleven = orders.order_number==11
eleven.head()

And now we select to keep these user_id where the condition is True

In [None]:
user_10 = orders.user_id[orders.order_number==11]
user_10.head()

In [None]:
user_10.shape

Which are 101.696 unique user_id (customers).

And now we select to keep from orders all these rows with a user_id that .isin( ) user_10 Series.
The method .isin() return a DataFrame showing whether each element in the DataFrame is contained in a Series.

In [None]:
orders_10 = orders[orders.user_id.isin(user_10)]
orders_10.head()

In [None]:
orders_10.shape

Which are 2.757.619 orders.

In [None]:
plt.figure(figsize=(15,5))
sns.countplot(x="order_dow", data=orders_10, color='red')
plt.ylabel('Total Orders', fontsize=10)
plt.xlabel('Hour of day', fontsize=10)
plt.title("Frequency of order by hour of day", fontsize=15)
plt.show()

In [None]:
plt.figure(figsize=(15,5))
graph=sns.countplot(x='order_number', data=orders_10, color='red')
graph.set( xticks=list( range(0,101,10) ), xticklabels=list( range(0,101,10) ) )
plt.ylabel('Total Customers', fontsize=10)
plt.xlabel('Total Orders', fontsize=10)
plt.title('How many orders do customers make?')
plt.show()

# 7.3 Your turn 📚📝
Now select the users with more than 20 rows and keep only the columns 'order_id', 'user_id', 'order_dow', 'order_hour_of_day', 'days_since_prior_order' in a single line with .loc[ ] method

In [None]:
twentyone = orders.user_id[orders.order_number==21]
orders_20 = orders.loc[orders.user_id.isin(twentyone),  ['order_id', 'user_id', 'order_dow', 'order_hour_of_day', 'days_since_prior_order' ]]
orders_20.head()

And now create three subplots for orders, orders_10, orders_20 that create a boxplot for days_since_prior_order

In [None]:
fig, axes = plt.subplots(nrows=1, ncols=3,figsize=(15,7))

orders.boxplot(column='days_since_prior_order', ax=axes[0])
orders_10.boxplot(column='days_since_prior_order',  ax=axes[1])
orders_20.boxplot(column='days_since_prior_order',  ax=axes[2])