<a href="https://colab.research.google.com/github/MBouchaqour/DS_Masjid2024/blob/main/DS4_progML_ipyn.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
# Keep here all the libraries needed for this project
import random
import matplotlib.pyplot as plt
import requests
import pandas as pd
import numpy as np
import seaborn as sns

**Scenario**: Our client has a pizzeria restaurant. She got reservations and made pizza based on these reservations. However, she always feels like she is producing or making more pizza than she needs to. She asked for help. Our first interaction with her was
1- Understand what her challenges
2- Ask her if she has any historical data or if she has any insight of sharing what might be the playing factors.

In [None]:
#using some random data
reservations = [random.randint(1, 31) for _ in range(30)]
pizza=[random.randint(15, 50) for _ in range(30)]
dt=list(zip(reservations,pizza))
print(dt)

In [None]:
#plot reservation and pizza using matplotlib.
plt.scatter(reservations, pizza)
plt.xlabel("Reservations")
plt.ylabel("Pizza")
plt.title("Reservations vs. Pizza")
plt.show()


In [None]:
# Our client provided us with this data
# Reading a file with txt extension directly from https://raw.githubusercontent.com/ahamez/progml/refs/heads/master/ch2/pizzas.txt

url = "https://raw.githubusercontent.com/ahamez/progml/refs/heads/master/ch2/pizzas.txt"
response = requests.get(url)

if response.status_code == 200:
    data = response.text
    print(data)
else:
    print("Failed to retrieve the file.")


In [None]:
# create a dataframe from data variable above

rows = data.strip().split('\n')
data_list = [row.split() for row in rows]
df = pd.DataFrame(data_list[1:], columns=data_list[0])
print(df)


In [None]:
# create a file pizza.txt from dataframe with no column index

df.to_csv('pizza.txt', index=False, sep=' ')


In [None]:
#using numpy because its has advantages over list

X, Y = np.loadtxt("/content/pizza.txt",delimiter=" ", skiprows=1, unpack=True)

In [None]:
# Let's check what we get for X and Y
print(X)
print(Y)

In [None]:
# In your case, it would be better to handle all the libraries needed for this project at once


# I wanted to plot X and Y and see how data is spread
sns.set()
plt.axis([0, 50, 0, 50])
plt.xticks(fontsize=10)
plt.yticks(fontsize=10)
plt.xlabel("Reservations", fontsize=10)
plt.ylabel("Pizzas", fontsize=10)

# X, Y = np.loadtxt("/content/pizza.txt",delimiter=",", skiprows=1, unpack=True)

plt.plot(X, Y, "bo")

plt.show()

# Remember: Your Goal is to come up with the best fit line. From our class, each one of us interacted with data differently and you did witness this in class

# Case1: Assuming B=0 (Data has no bias)

In [None]:
def predict(X, w, b=0):   # prediction
    return X * w + b


def loss(X, Y, w, b=0): # calculating the total error or loss
    return np.average((predict(X, w) - Y) ** 2)


# lr: learning rate
def train(X, Y, iterations, lr):
    w = 0
    for i in range(iterations):
        current_loss = loss(X, Y, w)
       # print("Iteration %4d => loss: %.6f" % (i, current_loss)) # Activate this code if you want to follow the iterations
        if loss(X, Y, w + lr) < current_loss:
            w += lr
        elif loss(X, Y, w - lr) < current_loss:
            w -= lr
        else:
            return w
    raise Exception("Could not converge")

w = train(X, Y, iterations=10_000, lr=0.001) # Trained my model in using X and Y (Remember this step is equivalent to
# training phase in ML)
print("\nw=%.3f" % w)

# Plot the chart

sns.set()
plt.plot(X, Y, "bo")
plt.xticks(fontsize=15)
plt.yticks(fontsize=15)
plt.xlabel("Reservations", fontsize=30)
plt.ylabel("Pizzas", fontsize=30)
x_edge, y_edge = 50, 50
plt.axis([0, x_edge, 0, y_edge])
plt.plot([0, x_edge],[0,y_edge], linewidth=1.0, color="g")
plt.show()

# Time to return to our client and say that the best approximation weight is w =1.8, so using the model Y=1.8X may help predict values

In [None]:
# Our client wants to use some data points and test our model. Hope he will be satisfied
# given that w = 1.844, and b =0
#Let discover the first 5 elements of X. Same time, let's check their output using Y[:5]
print("Input data: " , X[:5], "Output data: ", Y[:5])
# We can also assign values to Y:
m = 14 # assign a value to m from the list X[:5] above and check its output using Y[X==m]
print(f"The output of variavle {m} is: {Y[X==m].mean()}") # Calculating the mean because some input has more than one output


In [None]:
#Our client already know that the output for 14 should be 32,
# But he wants to test our best fit model
y_bar=predict(8,1.844)
print("Prediction using our model: ", y_bar)

What is your thought on the result?
It seems that our client is not happy with our model. We need to look for better ways to enhance our model. Let's include the bias, decrease the learning rate or even increase the iterations.

In [None]:
def predict(X, w, b): # adding the bias (No more 0)
    return X * w + b


def loss(X, Y, w, b): # Adding the bias
    return np.average((predict(X, w, b) - Y) ** 2)


# lr: learning rate
def train(X, Y, iterations, lr):
    b = 0
    w = 0
    for i in range(iterations):
        current_loss = loss(X, Y, w, b)
       # print("Iteration %4d => loss: %.6f" % (i, current_loss)) # Activate this code if you want to see the iteration stages

        if loss(X, Y, w + lr, b) < current_loss:
            w += lr
        elif loss(X, Y, w - lr, b) < current_loss:
            w -= lr
        elif loss(X, Y, w, b + lr) < current_loss:
            b += lr
        elif loss(X, Y, w, b - lr) < current_loss:
            b -= lr
        else:
            return w, b

    raise Exception("Could not converge")

w, b = train(X, Y, iterations=10_000, lr=0.01) # Note: No change for iterations
print("\nw=%.3f, b=%.3f" % (w, b))

# Plot the chart

sns.set()
plt.plot(X, Y, "bo")
plt.xticks(fontsize=15)
plt.yticks(fontsize=15)
plt.xlabel("Reservations", fontsize=30)
plt.ylabel("Pizzas", fontsize=30)
x_edge, y_edge = 50, 50
plt.axis([0, x_edge, 0, y_edge])
plt.plot([0, x_edge], [b, predict(x_edge, w, b)], linewidth=1.0, color="g")
plt.show()

Let's check if our client will be happy with our model this time.

In [None]:
print("Given input: x=%d => Output should y=%.2f" % (21, Y[X==21].mean())) # # This info is known by our
# using the last w, b
print("Prediction: x=%d => y_bar=%.2f" % (20, predict(20, w, b))) # this info is not known by our client

#Upcoming Challenges:
- Our client provided us with new information that comes with data.
- There are many factors that play and impact the predictions such as temperature and the number of tourist people in town.
## We need to think of better ways to create a model that can handle complex situations.
