<h2><b>Waiter Tips Prediction</b></h2>
<h4><b>Author:</b> Data Science @ Georgia Tech</h4>
<p><b>Reference:</b> <a href="https://medium.com/coders-camp/225-machine-learning-projects-with-python-44d6ea8ace18">Medium</a></p>

<b>Welcome to the Waiter Tips Prediction self-guided project!</b>

In this self-guided project we are going to explore waiter tips in the restaurant industry.

We will take a look at the relationships between variables and extract valuable insights that will help us answer some very important business questions.

- First, let's import the necessary modules and sub libraries from these modules.
- Import the <code>pandas</code> and <code>numpy</code> libraries.
- Then import the <code>express</code> and <code>graph_objects</code> sub-libraries from <code>plotly</code>.
- Finally, load the dataset into the workbook and return the first 5 rows of the dataset.

The dataset can be found here: https://raw.githubusercontent.com/amankharwal/Website-data/master/tips.csv. After copying this link you have to save it as a CSV file so that it will put the link into an excel format workbook.

In [None]:
# Write your code here.


<details>
  <summary>Click for solution: <font color="sky blue"><b>Import Libraries + Read Data</b></font></summary>
    <pre>
      <code style="display: block;">
        # Solution Code
        import pandas as pd
        import numpy as np
        import plotly.express as px
        import plotly.graph_objects as go
        data = pd.read_csv("tips.csv")
        print(data.head())
      </code>
    </pre>
</details>

We are going to start off this project by analyzing the relationship between waiter tips and cost of the meal.

We are also going to group the data points by the day of the week into a scatterplot.

Create a <b>scatterplot</b> to model the relationship between waiter tips and cost of the meal. Be sure to group the points by <b>day of the week</b>.

In [None]:
# Write your code here.


<details>
  <summary>Click for solution: <font color="sky blue"><b>Scatterplot Solution (1)</b></font></summary>
  <pre>
    <code style="display: block;">
      # Solution code text
      figure = px.scatter(data_frame = data, x="total_bill", y="tip", size="size", color= "day", trendline="ols")
      # the "ols" value for trendline means that a least squares regression line will be drawn in the scatterplot.
      figure.show()
    </code>
  </pre>
</details>

We will analyze the same relationship but do it by gender. In other words, we will examine the relationship between the total bill and the tip amount.

This time, we will group by gender.

Create a scatterplot to model the relationship between waiter tips and cost of the meal. Be sure to group the points by <b>gender</b>.

In [None]:
# Write your code here.


<details>
  <summary>Click for solution: <font color="sky blue"><b>Scatterplot Solution (2)</b></font></summary>
  <pre>
    <code style="display: block;">
      # Solution code
      figure = px.scatter(data_frame = data, x="total_bill", y="tip", size="size", color= "sex", trendline="ols")
      figure.show()
    </code>
  </pre>
</details>

Another relationship we are going to analyze will be in terms of total bill and tip, but, this time, we will group up our data points by time of day.

The options we have for time of day is lunch and dinner.

Create a scatterplot to model the relationship between waiter tips and cost of meal. We will group the points by <b>time of day (lunch or dinner)</b>.

In [None]:
# Write your code here.


<details>
  <summary>Click for solution: <font color="sky blue"><b>Scatterplot Solution (3)</b></font></summary>
  <pre>
    <code style="display: block;">
      # Solution code text
      figure = px.scatter(data_frame = data, x="total_bill", y="tip", size="size", color= "time", trendline="ols")
      figure.show()
    </code>
  </pre>
</details>

We will create a pie chart to see what days people tip the most to waiters on.

Create a chart that depicts this relationship.

In [None]:
# Write your code here.


<details>
  <summary>Click for solution: <font color="sky blue"><b>Pie Chart Solution (1)</b></font></summary>
  <pre>
    <code style="display: block;">
      # Our solution
      figure = px.pie(data, values='tip', names='day', hole = 0.5)
      figure.show()
    </code>
  </pre>
</details>

According to the visualization above, on <b>Saturdays</b>, the most tips are given to the waiters.

We will create another pie chart that examines the percentage breakdown of tips by gender.

Create a chart that depicts this relationship.

In [None]:
# Write your code here.


<details>
  <summary>Click for solution: <font color="sky blue"><b>Pie Chart Solution (2)</b></font></summary>
  <pre>
    <code style="display: block;">
      # Our solution
      figure = px.pie(data, values='tip', names='sex',hole = 0.5)
      figure.show()
    </code>
  </pre>
</details>

As we can see, the majority of people that give the tips are men, this can be seen by looking at the percentage by the gender chart.

Create a chart that examines whether most tips are given by someone who smokes.

In [None]:
# Write your code here.


<details>
  <summary>Click for solution: <font color="sky blue"><b>Pie Chart Solution (3)</b></font></summary>
  <pre>
    <code style="display: block;">
        # Our solution
        figure = px.pie(data, values='tip', names='smoker',hole = 0.5)
        figure.show()
    </code>
  </pre>
</details>

As you can see, non-smokers tip more than people who smoke. You can see this because roughly 62% of the dataset is non-smokers.

Create a pie chart that breaks down whether more people tip during lunch or dinner times.

In [None]:
# Write your code here.


<details>
  <summary>Click for solution: <font color="sky blue"><b>Pie Chart Solution (4)</b></font></summary>
  <pre>
    <code style="display: block;">
        # Our solution
        figure = px.pie(data, values='tip', names='time', hole = 0.5)
        figure.show()
    </code>
  </pre>
</details>

We see that most tips come from dinner times instead of lunch times.

Now is the prediction model time! Since most columns are categorical, we need to convert them into numerical values.

We can achieve this by mapping column values into numbers by using a dictionary, which is a data structure that stores a key value pair.

Map the sex, smoker, day, and time columns to numerical values.

In [None]:
# Write your code here.

<details>
  <summary>Click for solution: <font color="sky blue"><b>Mapping Values</b></font></summary>
  <pre>
    <code style="display: block;">
        # Our solution
        data["sex"] = data["sex"].map({"Female": 0, "Male": 1})
        data["smoker"] = data["smoker"].map({"No": 0, "Yes": 1})
        data["day"] = data["day"].map({"Thur": 0, "Fri": 1, "Sat": 2, "Sun": 3})
        data["time"] = data["time"].map({"Lunch": 0, "Dinner": 1})
        data.head()
    </code>
  </pre>
</details>

Split the dataset into testing and training sets.

In [None]:
# Write your code here.


<details>
  <summary>Click the link for solution: <font color="sky blue"><b>Train-Test Split</b></font></summary>
    <code style="display: block;">
        # Our solution
        x = np.array(data[["total_bill", "sex", "smoker", "day","time", "size"]])
        y = np.array(data["tip"])
        from sklearn.model_selection import train_test_split
        xtrain, xtest, ytrain, ytest = train_test_split(x, y, test_size=0.2, random_state=42) # For the random_state, you can use any value you want. In this case, we chose 42.
    </code>
</details>

Create a Linear Regression model and fit it.

In [None]:
# Write your code here.


<details>
  <summary>Click for solution: <font color="sky blue"><b>Linear Regression</b></font></summary>
  <pre>
    <code style="display: block;">
        # Our solution
        from sklearn.linear_model import LinearRegression
        model = LinearRegression()
        model.fit(xtrain, ytrain)
    </code>
  </pre>
</details>

Give inputs to the Linear Regression model and show the predictions.

In [None]:
# Write your code here.


<details>
  <summary>Click for solution: <font color="sky blue"><b>Prediction</b></font></summary>
  <pre>
    <code style="display: block;">
        # Our solution
        # features = [[total_bill, "sex", "smoker", "day", "time", "size"]]
        features = np.array([[24.50, 1, 0, 0, 1, 4]])
        model.predict(features)
    </code>
  </pre>
</details>

## **Summary**
**Congratulations on completing the Waiter Tips Predictions project!**

As you saw, we can use machine learning models to predict our data.

We hope you have learned something new along all of these self-guided projects. Congratulations!