# Boosting Online Retail Sales
<span id="0"></span>
<hr/>

### Overview
<hr/>

This notebook will aim to boost the sales of an online retailer, applying the predictive approach in all its actions.

As you can imagine, there are several methods to achieve this and each method has pros and cons. This notebook was designed to seek a global analysis that considers different data science techniques and disseminates marketing concepts.

The methods applied in this notebook use a specific combination of data analysis, programming and machine learning, which is divided as follows:
1. [Loading and Checking Data](#1)
2. [Data Preprocessing](#2)
3. [Creating Models](#3)
4. [Conclusion](#4)

The dataset used in this notebook is available from Keaggle at the following link: https://www.kaggle.com/datasets/vijayuv/onlineretail.

### <span id="1"></span>  1. Loading and Checking Data
##### [Return to overview](#0)
<hr/>


In order to make some analysis, we need to set our environment up. To do this, I first imported some modules and read the data. The below output it is possible to view various details about the dataset.
First I start by importing the libraries we will need:

In [30]:
# import libraries
from datetime import datetime, timedelta
import pandas as pd
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
#from __future__ import division

#import plotly.plotly as py
import plotly.offline as pyoff
import plotly.graph_objs as go

#initiate visualization library for jupyter notebook 
pyoff.init_notebook_mode()

We read the first lines of the data with the help of pandas, to analyze what information is contained:

In [31]:
tx_data = pd.read_csv('../input/onlineretail/OnlineRetail.csv', encoding='latin1')
tx_data.head(10)

The following is a dataframe that was created to show monthly revenues:

In [32]:
#converting the type of Invoice Date Field from string to datetime.
tx_data['InvoiceDate'] = pd.to_datetime(tx_data['InvoiceDate'])

#creating YearMonth field for the ease of reporting and visualization
tx_data['InvoiceYearMonth'] = tx_data['InvoiceDate'].map(lambda date: 100*date.year + date.month)

#calculate Revenue for each row and create a new dataframe with YearMonth - Revenue columns
tx_data['Revenue'] = tx_data['UnitPrice'] * tx_data['Quantity']
tx_revenue = tx_data.groupby(['InvoiceYearMonth'])['Revenue'].sum().reset_index()
tx_revenue

A graph was created below that shows the evolution of monthly revenue over the months. It shows that there has been an increase in recent months.

In [33]:
#X and Y axis inputs for Plotly graph. We use Scatter for line graphs
plot_data = [
    go.Scatter(
        x=tx_revenue['InvoiceYearMonth'],
        y=tx_revenue['Revenue'],
    )
]

plot_layout = go.Layout(
        xaxis={"type": "category"},
        title='Montly Revenue'
    )
fig = go.Figure(data=plot_data, layout=plot_layout)
pyoff.iplot(fig)

Now we need to find out what the Monthly Revenue Growth Rate is.

In [34]:
#using pct_change() function to see monthly percentage change
tx_revenue['MonthlyGrowth'] = tx_revenue['Revenue'].pct_change()

#showing first 5 rows
tx_revenue.head()

#visualization - line graph
plot_data = [
    go.Scatter(
        x=tx_revenue.query("InvoiceYearMonth < 201112")['InvoiceYearMonth'],
        y=tx_revenue.query("InvoiceYearMonth < 201112")['MonthlyGrowth'],
    )
]

plot_layout = go.Layout(
        xaxis={"type": "category"},
        title='Montly Growth Rate'
    )

fig = go.Figure(data=plot_data, layout=plot_layout)
pyoff.iplot(fig)

It is possible that in the last month of November there was an increase of 36.5%, but it is possible to observe that there were months in which there was an increase in revenue, as well as a decrease in revenue. Now it is important to make a deeper assessment to identify what were the reasons for these negative oscillations.