<h2><b>Future Sales Prediction</b></h2>
<h4><b>Author:</b> Data Science @ Georgia Tech</h4>
<p><b>Reference:</b> <a href="https://medium.com/coders-camp/225-machine-learning-projects-with-python-44d6ea8ace18">Medium</a></p>

<b>Welcome to the Future Sales Prediction self-guided project!</b>

We will explore how predicting sales data can play an important role in managing the manufacturing and advertising cost of a product.

As always, let's import the necessary modules that we will need in order to do the project.

In [None]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression

First, let's read in the dataset.

In [None]:
# Write your code here.


<details>
  <summary>Click for solution: <font color="sky blue"><b>Reading the Dataset</b></font></summary>
  <pre>
    <code style="display: block;">
      # Solution
      data = pd.read_csv("advertising.csv")
      print(data.head())
    </code>
  </pre>
</details>

The column descriptions are given below:

<ul>
    <li><code>TV</code>: Advertising cost spent in dollars for advertising on TV</li>
    <li><code>Radio</code>: Advertising cost spent in dollars for advertising on Radio</li>
    <li><code>Newspaper</code>: Advertising cost spent in dollars for advertising on Newspaper</li>
    <li><code>Sales</code>: Number of units sold</li>
</ul>

Check how many null values we have in this dataset by column.

What does this tell you about the data we have so far?

In [None]:
# Write your code here.


<details>
  <summary>Click for solution: <font color="sky blue"><b>Finding Null Data</b></font></summary>
  <pre>
    <code style="display: block;">
      # Null data by column solution
      print(data.isnull().sum())
    </code>
  </pre>
</details>

Now we will examine the relationship between variables.

The first one we are going to examine is the relationship between TV advertising and number of units sold.

Create a scatterplot to examine the relationship between <b>TV advertising</b> and the <b>number of units sold</b>.

What does the scatterplot tell you about the relationship between these variables?

In [None]:
# Write your code here.


<details>
  <summary>Click for solution: <font color="sky blue"><b>Create Scatterplot (1)</b></font></summary>
  <pre>
    <code style="display: block;">
      # Our scatterplot relationship 1 solution code
      import plotly.express as px
      import plotly.graph_objects as go
      figure = px.scatter(data_frame = data, x="Sales",
                          y="TV", size="TV", trendline="ols")
      figure.show()
    </code>
  </pre>
</details>

Next we will examine the relationship between Newspaper advertising and number of units sold.

Create a scatterplot that examines the relationship between <b>Newspaper advertising</b> and the <b>number of units sold</b>.

What does the scatterplot tell you about the relationship between these variables?

In [None]:
# Write your code here.


<details>
  <summary>Click for solution: <font color="sky blue"><b>Create Scatterplot (2)</b></font></summary>
  <pre>
    <code style="display: block;">
      # Our scatterplot relationship 2 solution code
      figure = px.scatter(data_frame = data, x="Sales", y="Newspaper", size="Newspaper", trendline="ols")
      figure.show()
    </code>
  </pre>
</details>

Thirdly, we will examine the relationship between Radio advertising and number of units sold.

Create a scatterplot that examines the relationship between <b>Radio advertising</b> and the <b>number of units sold</b>.

What does the scatterplot tell you about the relationship between these variables?

In [None]:
# Write your code here.


<details>
  <summary>Click for solution: <font color="sky blue"><b>Create Scatterplot (3)</b></font></summary>
  <pre>
    <code style="display: block;">
      # Our scatterplot relationship 3 solution code
      figure = px.scatter(data_frame = data, x="Sales", y="Radio", size="Radio", trendline="ols")
      figure.show()
    </code>
  </pre>
</details>

Take note of your observations so far and ask yourself this question: What form of advertising has more influence on the sales of the product?

<b>Note:</b> You will need an answer to move on to the next part of the project.

We will compute the correlations between variables. Let's see if your answer to the previous question is right.

Find the correlations of all the columns with the sales column.

In [None]:
# Write your code here.


<details>
  <summary>Click for solution: <font color="sky blue"><b>Find Correlations</b></font></summary>
  <pre>
    <code style="display: block;">
        # Correlation code solution
        correlation = data.corr()
        print(correlation["Sales"].sort_values(ascending=False)) # You can also put True for the ascending value if you like.
    </code>
  </pre>
</details>

We will transition to the Machine Learning model portion of the project.

The first step in doing this part is to split your model on <b>testing and training data</b>.

Perform an 80-20 split on the dataset into training and testing data respectively.

In [None]:
# Write your code here.


<details>
  <summary>Click for solution: <font color="sky blue"><b>Split Dataset</b></font></summary>
  <pre>
    <code style="display: block;">
        # Train-Test split solution
        x = np.array(data.drop(["Sales"], 1))
        y = np.array(data["Sales"])
        xtrain, xtest, ytrain, ytest = train_test_split(x, y, test_size=0.2, random_state=42)
    </code>
  </pre>
</details>

We are now going to model our data by fitting it into a Linear Regression model.

Model the data and print out the score metric of the Linear Regression Model. You can print out any other machine learning metrics if you like.

In [None]:
# Write your code here.


<details>
  <summary>Click for solution: <font color="sky blue"><b>Modeling and Score</b></font></summary>
  <pre>
    <code style="display: block;">
        # Modeling and score printing solution
        model = LinearRegression()
        model.fit(xtrain, ytrain)
        print(model.score(xtest, ytest))
    </code>
  </pre>
</details>

Now that we modeled our model, we can input test values to test our model predictions.

Enter in some test values and run your model on those values.

In [None]:
# Write your code here.


<details>
  <summary>Click for solution: <font color="sky blue"><b>Testing Values</b></font></summary>
  <pre>
    <code style="display: block;">
        # Testing values solution
        # features = [[TV, Radio, Newspaper]]
        features = np.array([[230.1, 37.8, 69.2]]) # You can put any values you want in here. These are here for testing purposes.
        print(model.predict(features))
    </code>
  </pre>
</details>

## **Summary**

**Congratulations on completing the Future Sales Prediction project!**

We hope you have learned about how different advertising methods affect product sales.