# Online Food Ordering

### About Data

#### **Description:**
##### This project is based on the analysis of a dataset of online food orders, with the aim of exploring the relationships between demographic factors, location and customer behavior.
#### **Attributes:**
##### Age: Age of the customer.
##### Gender: Gender of the customer.
##### Marital Status: Marital status of the customer.
##### Occupation: Occupation of the customer.
##### Monthly Income: Monthly income of the customer.
##### Educational Qualifications: Educational qualifications of the customer.
##### Family Size: Number of individuals in the customer's family.
##### Latitude: Latitude of the customer's location.
##### Longitude: Longitude of the customer's location.
##### Output: Current status of the order (e.g., pending, confirmed, delivered).
##### Feedback: Feedback provided by the customer after receiving the order.
#### **Purpose:**
##### Explore whether there are purchasing relationships for the food sold, understanding the influence of location, age, gender, family size and occupation of the customer on the purchase of the product.

# STEP 1 | Importing Date & Libraries

In [14]:
!pip install pandas
!pip install matplotlib
!pip install geopy



In [15]:
# Libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from geopy.geocoders import Nominatim
from IPython.display import display, HTML  # Displaying HTML output in Jupyter notebooks
import warnings  # Suppress warnings
from colorama import Fore, Style  # Colored terminal text

# STEP 2 | Loading Data

In [16]:
df_Foods = pd.read_csv('onlinefoods.csv')

# Step 3 | Data Overview

In [17]:
print("===================================================================================")
print("TOP 5 in dataset")

TOP 5 in dataset


In [33]:
df_Foods.head(5)

Unnamed: 0,Age,Gender,Marital Status,Occupation,Monthly Income,Educational Qualifications,Family size,latitude,longitude,Pin code,Output,Feedback
0,20,Female,Single,Student,No Income,Post Graduate,4,12.9766,77.5993,560001,Yes,Positive
1,24,Female,Single,Student,Below Rs.10000,Graduate,3,12.977,77.5773,560009,Yes,Positive
2,22,Male,Single,Student,Below Rs.10000,Post Graduate,3,12.9551,77.6593,560017,Yes,Negative
3,22,Female,Single,Student,No Income,Graduate,6,12.9473,77.5616,560019,Yes,Positive
4,22,Male,Single,Student,Below Rs.10000,Post Graduate,4,12.985,77.5533,560010,Yes,Positive


In [35]:
# df_Foods.drop(['Unnamed: 12'], index=1,inplace=True)

In [37]:

print("===================================================================================")
print("COLUMN REMOVED: Unnamed: 12")
print("===================================================================================")

COLUMN REMOVED: Unnamed: 12


In [38]:
print("===================================================================================")
print("Information columns")
df_Foods.info()
print("===================================================================================")
print("THERE ARE NO NULL VALUES")
print("===================================================================================")


Information columns
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 388 entries, 0 to 387
Data columns (total 12 columns):
 #   Column                      Non-Null Count  Dtype  
---  ------                      --------------  -----  
 0   Age                         388 non-null    int64  
 1   Gender                      388 non-null    object 
 2   Marital Status              388 non-null    object 
 3   Occupation                  388 non-null    object 
 4   Monthly Income              388 non-null    object 
 5   Educational Qualifications  388 non-null    object 
 6   Family size                 388 non-null    int64  
 7   latitude                    388 non-null    float64
 8   longitude                   388 non-null    float64
 9   Pin code                    388 non-null    int64  
 10  Output                      388 non-null    object 
 11  Feedback                    388 non-null    object 
dtypes: float64(2), int64(3), object(7)
memory usage: 36.5+ KB
THERE ARE NO N

In [60]:
print("======================================Describe columns=============================================")



In [39]:
df_Foods.describe()

Unnamed: 0,Age,Family size,latitude,longitude,Pin code
count,388.0,388.0,388.0,388.0,388.0
mean,24.628866,3.280928,12.972058,77.60016,560040.113402
std,2.975593,1.351025,0.044489,0.051354,31.399609
min,18.0,1.0,12.8652,77.4842,560001.0
25%,23.0,2.0,12.9369,77.565275,560010.75
50%,24.0,3.0,12.977,77.5921,560033.5
75%,26.0,4.0,12.997025,77.6309,560068.0
max,33.0,6.0,13.102,77.7582,560109.0


In [40]:
print("===================================================================================")
print("Duplicates rows found:", df_Foods.duplicated().sum())
print("===================================================================================")

Duplicates rows found: 103


In [41]:
print("===================================================================================")
print("Number of rows and columns:", df_Foods.shape)
print("===================================================================================")

Number of rows and columns: (388, 12)


In [42]:
def unique_values(train_dataset):
    df_UniqueValues = pd.DataFrame({
        'Name': train_dataset.columns,
        'Type': [train_dataset[col].dtype for col in train_dataset.columns],
        'Unique Values': [', '.join(map(str, train_dataset[col].unique()[0:7])) for col in train_dataset.columns]
    })
    display(df_UniqueValues)

print("===================================================================================")
print("Unique Values in Dataset")
unique_values(df_Foods)
print("===================================================================================")

Unique Values in Dataset


Unnamed: 0,Name,Type,Unique Values
0,Age,int64,"20, 24, 22, 27, 23, 21, 28"
1,Gender,object,"Female, Male"
2,Marital Status,object,"Single, Married, Prefer not to say"
3,Occupation,object,"Student, Employee, Self Employeed, House wife"
4,Monthly Income,object,"No Income, Below Rs.10000, More than 50000, 10..."
5,Educational Qualifications,object,"Post Graduate, Graduate, Ph.D, Uneducated, School"
6,Family size,int64,"4, 3, 6, 2, 5, 1"
7,latitude,float64,"12.9766, 12.977, 12.9551, 12.9473, 12.985, 12...."
8,longitude,float64,"77.5993, 77.5773, 77.6593, 77.5616, 77.5533, 7..."
9,Pin code,int64,"560001, 560009, 560017, 560019, 560010, 560103..."




# Step 4 | Feature Engineering

In [44]:
# Function to get a adresss based on latitude and longitude
def get_county_name(latitude, longitude):
    # Inicialized geocoder
    geolocator = Nominatim(user_agent="Online Food Order")
    # Concatenates latitude and longitude into a string in the format "latitude, longitude"
    location = f"{latitude}, {longitude}"
    # Use the geocoder to obtain the address
    address = geolocator.reverse(location)
    # Checks if the address was found
    if address:
        # Extracts the city name from the 'address' subdictionary
        county = address.raw['address'].get('county', '')  # If 'city' does not exist, returns an empty string
        return county
    else:
        return None

# Apply the function to the DataFrame to obtain city names
df_Foods['county'] = df_Foods.apply(lambda row: get_county_name(row['latitude'], row['longitude']), axis=1)

In [47]:
print("===================================================================================")
print("County column created based on latitude and longitude: count")
print("===================================================================================")

County column created based on latitude and longitude: count


In [47]:
unique_values(df_Foods)

Unnamed: 0,Name,Type,Unique Values
0,Age,int64,"20, 24, 22, 27, 23, 21, 28"
1,Gender,object,"Female, Male"
2,Marital Status,object,"Single, Married, Prefer not to say"
3,Occupation,object,"Student, Employee, Self Employeed, House wife"
4,Monthly Income,object,"No Income, Below Rs.10000, More than 50000, 10..."
5,Educational Qualifications,object,"Post Graduate, Graduate, Ph.D, Uneducated, School"
6,Family size,int64,"4, 3, 6, 2, 5, 1"
7,latitude,float64,"12.9766, 12.977, 12.9551, 12.9473, 12.985, 12...."
8,longitude,float64,"77.5993, 77.5773, 77.6593, 77.5616, 77.5533, 7..."
9,Pin code,int64,"560001, 560009, 560017, 560019, 560010, 560103..."


# Step 5 | EDA Analysis

# Step 6 | Outlier Dedection

# Step 7 | Numerical Feature Distribution

# Step 7.1 | Normalization

# Step 8 | Correlation Analysis

# Step 9 | Encoding

# Step 10 | Model Building

# Base Classifier

# Fitting Classifier

# Evaluating Classifier

# Storing Reults in DF and Visualizing the Results