In [1]:
import numpy as np
import pandas as pd
import seaborn as sns
from sklearn.linear_model import LinearRegression

In [2]:
# Load the dataset
df = sns.load_dataset('tips')
df.head()


Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2
1,10.34,1.66,Male,No,Sun,Dinner,3
2,21.01,3.5,Male,No,Sun,Dinner,3
3,23.68,3.31,Male,No,Sun,Dinner,2
4,24.59,3.61,Female,No,Sun,Dinner,4


In [3]:
# Select features columns
x = df[['total_bill','size']]

#select target column

y = df['tip']

In [4]:
model = LinearRegression()
model.fit(x,y)

In [5]:
#lets predict the tip for a total bill of 50
total_bill = np.array([[16.99,2]])
predicted_tip = model.predict(total_bill)



In [6]:
print(f'Predicted tip for a total bill of 16.99 with size 2: {round(predicted_tip[0],2)}')

Predicted tip for a total bill of 16.99 with size 2: 2.63


In [7]:
df.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2
1,10.34,1.66,Male,No,Sun,Dinner,3
2,21.01,3.5,Male,No,Sun,Dinner,3
3,23.68,3.31,Male,No,Sun,Dinner,2
4,24.59,3.61,Female,No,Sun,Dinner,4


In [8]:
# encode categorical variables
df['smoker_numerical'] = df['smoker'].map({"Yes":1,"No":0})

In [9]:
df.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,smoker_numerical
0,16.99,1.01,Female,No,Sun,Dinner,2,0
1,10.34,1.66,Male,No,Sun,Dinner,3,0
2,21.01,3.5,Male,No,Sun,Dinner,3,0
3,23.68,3.31,Male,No,Sun,Dinner,2,0
4,24.59,3.61,Female,No,Sun,Dinner,4,0


In [10]:
# Select features including the encoded categorical variable
x = df[['total_bill','smoker_numerical','size']]
# Select target column
y = df['tip']

In [11]:
model = LinearRegression()
model.fit(x,y)


In [12]:
model.predict([[16.99,1,8]])  # Predicting tip for a total bill of 16.99, smoker Yes (1), size 2



array([3.66339921])

In [13]:
df['smoker'].value_counts()  # Check the value counts of the encoded smoker variable

smoker
No     151
Yes     93
Name: count, dtype: int64

# Classification


---

### 🔹 What is **Classification**?

**Classification** is when you **predict a category or label** for something based on the data you have.

#### Example:

You have many photos of animals — some cats, some dogs.
You want a computer program to look at a new photo and tell whether it’s a **cat or a dog**.

✅ This kind of problem is called **Classification**.

> **In short**: Classification means **putting something into a group or class**.

---

### 🔹 What is **Logistic Regression**?

Even though it has “regression” in the name, **Logistic Regression is used for Classification**, not for predicting numbers.

It’s used when your **output is Yes/No**, **True/False**, or **1/0**.

#### Example:

You want to predict if a person will get a **loan** or not.

You have some data:

* Age
* Income
* Job status

And you want the answer:

* 1 → Will get a loan
* 0 → Will not get a loan

You can use **Logistic Regression** here.

---

### 🔹 How Logistic Regression works:

1. It learns from your data.
2. It calculates the **probability** of something happening.
3. If the probability is:

   * More than 0.5 → it says **YES** (1)
   * Less than 0.5 → it says **NO** (0)

---

### 🧠 Simple Summary Table:

| Term                    | Meaning                                                       |
| ----------------------- | ------------------------------------------------------------- |
| **Classification**      | Predicting a **class or group** (e.g., Cat or Dog)            |
| **Logistic Regression** | A tool/algorithm to **predict yes or no (1/0)** based on data |

---


## *Logistic Regression*

In [14]:
from sklearn.linear_model import LogisticRegression

In [15]:
# Select features columns
x = df[['total_bill','tip','size']]
# Select target column
y = df['smoker']

# load/call the Logistic regression model
model = LogisticRegression()
# Fit the model
model.fit(x, y)

In [16]:
# Predicting if a person is a smoker or not based on total bill and tip
model.predict([[50,3,2]])



array(['Yes'], dtype=object)

# *Naive_Bayes*


---

### 🔷 What is Naive Bayes?

**Naive Bayes** is a **supervised learning algorithm** used for **classification**.
It helps us **predict a category or class**, like:

* Is the email spam or not?
* Is the review positive or negative?
* Will the person get sick or not?

---

### 🔸 How it works:

It is based on **Bayes' Theorem**, which uses **probability** to make predictions.

It calculates:

> “What is the chance this item belongs to a particular class, based on the data?”

#### Example:

If an email contains the word **“free”**, Naive Bayes checks:

* How often “free” appears in spam emails
* How often “free” appears in good emails
  → Then decides whether the new email is spam or not.

---

### 🔸 Why is it called "Naive"?

It assumes that all features (like words in an email) are **independent** — which is usually not true, but this **simple assumption makes the model fast** and often surprisingly good!

---

### 🔸 Where is it used?

* Spam detection
* Sentiment analysis (positive/negative)
* News categorization
* Disease prediction
* Face recognition

---

### 🔸 Advantages:

* Very fast and simple
* Works well with text data
* Good for large datasets
* Easy to train

---

### 🧠 Summary:

| Feature    | Description                        |
| ---------- | ---------------------------------- |
| Type       | Classification                     |
| Based on   | Bayes’ Theorem (probability)       |
| Assumption | Features are independent           |
| Best for   | Text data (emails, reviews)        |
| Example    | Spam detection, medical prediction |

---

In [17]:
from sklearn.naive_bayes import GaussianNB as NaiveBayes

In [18]:
# Select features and target variable
x = df[['total_bill','size']]
y = df['smoker']

# Load the Naive Bayes model
model = NaiveBayes()
# Fit the model
model.fit(x, y)
# Predicting if a person is a smoker or not based on total bill and size
model.predict([[50,2]])  # Predicting for a total bill of 50 and size 2 



array(['Yes'], dtype='<U3')

# Decision Trees

In [20]:
# Import Decision Trees.
from sklearn.tree import DecisionTreeClassifier

In [None]:
#Data set
df.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,smoker_numerical
0,16.99,1.01,Female,No,Sun,Dinner,2,0
1,10.34,1.66,Male,No,Sun,Dinner,3,0
2,21.01,3.5,Male,No,Sun,Dinner,3,0
3,23.68,3.31,Male,No,Sun,Dinner,2,0
4,24.59,3.61,Female,No,Sun,Dinner,4,0


In [23]:
#select the feature column from the df
x = df[['total_bill','size','tip']]
#select the target column from the df
y = df['smoker']
#load the model
model = DecisionTreeClassifier()
#fit the model
model.fit(x,y)

In [None]:
smoker = model.predict([[89,3,10]])
smoker 



array(['Yes'], dtype=object)