# AI/ML Use Cases for Dairy Dataset

This notebook covers 14 AI/ML applications based on a dairy dataset. Each use case explores how we can utilize machine learning models to solve specific problems and provide insights in the dairy industry. Let’s explore them one by one!

## 1. Demand Forecasting
**What is it?**

We want to predict how much of a product will be sold in the future. This helps us make sure we have enough stock and avoid running out.

**How we do it:**

- We use past sales data.
- We consider factors like product type, location, and sales channel.
- Machine learning models such as Time Series, ARIMA, or LSTM can help make these predictions.

In [3]:
import pandas as pd

# Load the dairy dataset
dairy_data = pd.read_csv('dairy_dataset.csv')  # Replace with the correct path to your dataset

# Check the first few rows to confirm the data is loaded
print(dairy_data.head())


        Location  Total Land Area (acres)  Number of Cows Farm Size  \
0      Telangana                   310.84              96    Medium   
1  Uttar Pradesh                    19.19              44     Large   
2     Tamil Nadu                   581.69              24    Medium   
3      Telangana                   908.00              89     Small   
4    Maharashtra                   861.95              21    Medium   

         Date  Product ID Product Name                 Brand  \
0  2022-02-17           5    Ice Cream           Dodla Dairy   
1  2021-12-01           1         Milk                  Amul   
2  2022-02-28           4       Yogurt           Dodla Dairy   
3  2019-06-09           3       Cheese  Britannia Industries   
4  2020-12-14           8   Buttermilk          Mother Dairy   

   Quantity (liters/kg)  Price per Unit  ...  Production Date  \
0                222.40           85.72  ...       2021-12-27   
1                687.48           42.61  ...       2021-10

In [4]:
from statsmodels.tsa.arima.model import ARIMA
import pandas as pd

# Example: Predicting future sales
sales_data = dairy_data[['Date', 'Quantity Sold (liters/kg)']] 
sales_data['Date'] = pd.to_datetime(sales_data['Date'])
sales_data.set_index('Date', inplace=True)

# Train ARIMA model
model = ARIMA(sales_data, order=(5,1,0))  
model_fit = model.fit()

# Forecast future sales
forecast = model_fit.forecast(steps=10)  # Predicting the next 10 periods
print(forecast)


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  sales_data['Date'] = pd.to_datetime(sales_data['Date'])
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)


4325    193.030545
4326    208.780716
4327    166.144844
4328    191.936227
4329    187.461045
4330    232.683696
4331    195.480808
4332    198.875965
4333    194.304658
4334    200.416626
Name: predicted_mean, dtype: float64


  return get_prediction_index(
  return get_prediction_index(


## 2. Price Optimization
**What is it?**

We want to set the best price for products to make the most money.

**How we do it:**

- Use data like price, quantity sold, and revenue.
- Models like regression or reinforcement learning can help us find the best price.

In [5]:
from sklearn.linear_model import LinearRegression
import numpy as np

# Features: Price per unit, sales quantity
X = dairy_data[['Price per Unit']].values.reshape(-1, 1)
y = dairy_data['Quantity Sold (liters/kg)']

# Train a linear regression model
model = LinearRegression()
model.fit(X, y)

# Predict sales quantity for a new price
new_price = np.array([[45]])  # Example new price
predicted_sales = model.predict(new_price)
print(f"Predicted sales for price 45: {predicted_sales}")


Predicted sales for price 45: [248.25920878]


## 3. Inventory Management & Reordering Predictions
**What is it?**

We want to make sure we have enough stock without over-ordering or running out.

**How we do it:**

- Use sales data, stock levels, and reorder quantities.
- Models like classification or clustering help decide when to reorder and how much.

In [None]:
from sklearn.ensemble import RandomForestClassifier

# Binary classification: Stock is low (1) or high (0)
dairy_data['Low Stock'] = dairy_data['Quantity in Stock (liters/kg)'] < dairy_data['Minimum Stock Threshold (liters/kg)']

# Features: Stock levels, quantity sold, reorder quantities
X = dairy_data[['Quantity in Stock (liters/kg)', 'Quantity Sold (liters/kg)', 'Reorder Quantity (liters/kg)']]
y = dairy_data['Low Stock']

# Train a Random Forest classifier
model = RandomForestClassifier()
model.fit(X, y)

# Predict if stock is low for new data
new_data = [[300, 150, 50]]  # Example new data
prediction = model.predict(new_data)
print(f"Low stock prediction: {prediction}")


## 4. Product Recommendations
**What is it?**

We want to suggest products to customers based on their past purchases.

**How we do it:**

- Analyze customer purchase history and product preferences.
- Use collaborative filtering and deep learning to make recommendations.

In [None]:
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np

# Example user-product matrix
user_product_matrix = np.array([
    [1, 0, 0, 1],  # User 1 bought products 1 and 4
    [0, 1, 1, 0],  # User 2 bought products 2 and 3
    [1, 1, 0, 0],  # User 3 bought products 1 and 2
])

# Compute cosine similarity between users
user_similarity = cosine_similarity(user_product_matrix)

# Example: Recommend for user 1 based on similar users
similar_users = np.argsort(user_similarity[0])[::-1]
print(f"Similar users to user 1: {similar_users}")


## 5. Customer Segmentation
**What is it?**

We want to group customers into categories based on their buying habits, so we can target them better.

**How we do it:**

- Use data like customer location, product preferences, and revenue.
- Clustering algorithms help us divide customers into groups.

In [None]:
from sklearn.cluster import KMeans

# Features: Customer location, total revenue
X = dairy_data[['Customer Location', 'Approx. Total Revenue(INR)']]

# Apply K-means clustering
kmeans = KMeans(n_clusters=3)
dairy_data['Customer Segment'] = kmeans.fit_predict(X)

# Check the clusters
print(dairy_data[['Customer Location', 'Customer Segment']])


## 6. Anomaly Detection for Sales or Production
**What is it?**

We want to find unusual patterns in sales or production, like sudden spikes or drops.

**How we do it:**

- Use sales quantities, prices, and time data.
- Models like Isolation Forest or One-Class SVM help detect anomalies.

In [None]:
from sklearn.ensemble import IsolationForest

# Features: Sales quantity
X = dairy_data[['Quantity Sold (liters/kg)']]

# Train Isolation Forest for anomaly detection
model = IsolationForest(contamination=0.1)
dairy_data['Anomaly'] = model.fit_predict(X)

# Check for anomalies
anomalies = dairy_data[dairy_data['Anomaly'] == -1]
print(anomalies)


## 7. Farm Performance Analysis
**What is it?**

We want to compare different farms to see which ones are performing better and why.

**How we do it:**

- Use data like farm size, land area, number of cows, and production quantity.
- Clustering and regression models help analyze performance.

In [None]:
X = dairy_data[['Total Land Area (acres)', 'Number of Cows', 'Quantity (liters/kg)']]

# Apply K-means clustering
kmeans = KMeans(n_clusters=3)
dairy_data['Farm Performance'] = kmeans.fit_predict(X)

# Check the clusters
print(dairy_data[['Location', 'Farm Performance']])


## 8. Predicting Product Expiry & Shelf-Life Optimization
**What is it?**

We want to predict when products will expire so we can sell them before it's too late.

**How we do it:**

- Use production and expiration dates, sales history, and stock levels.
- Classification models predict which products are at risk of expiring.

In [None]:
from sklearn.linear_model import LogisticRegression

# Features: Time till expiration, quantity in stock
dairy_data['Days Until Expiration'] = (pd.to_datetime(dairy_data['Expiration Date']) - pd.to_datetime(dairy_data['Date'])).dt.days
X = dairy_data[['Days Until Expiration', 'Quantity in Stock (liters/kg)']]
y = dairy_data['Days Until Expiration'] < 30  # Binary classification: Expiring in less than 30 days

# Train logistic regression model
model = LogisticRegression()
model.fit(X, y)

# Predict expiration risk
new_data = [[10, 100]]  # Example new data
prediction = model.predict(new_data)
print(f"Expiration risk: {prediction}")


## 9. Profitability Analysis
**What is it?**

We want to know how much profit we make from each product or farm.

**How we do it:**

- Use data like revenue, costs, and production.
- Regression models help predict profitability.

In [None]:
X = dairy_data[['Price per Unit', 'Quantity Sold (liters/kg)']]
y = dairy_data['Approx. Total Revenue(INR)']

# Train a linear regression model
model = LinearRegression()
model.fit(X, y)

# Predict revenue for a new product
new_data = [[50, 200]]  # Example new price and sales quantity
predicted_revenue = model.predict(new_data)
print(f"Predicted revenue: {predicted_revenue}")


## 10. Supply Chain Optimization
**What is it?**

We want to reduce delivery costs by finding the best routes and sales strategies.

**How we do it:**

- Use customer and farm locations, sales channels, and quantities sold.
- Optimization algorithms like genetic algorithms can help improve routes.

In [None]:
# A placeholder for genetic algorithm implementation (this would be more complex in a real application)
# You would typically use libraries like DEAP to implement Genetic Algorithms.

def evaluate_route(route):
    # Placeholder: A function to evaluate the cost of a route
    return sum(route)

# Example of route cost calculation
route = [10, 20, 15, 25]  # Distances between locations
cost = evaluate_route(route)
print(f"Route cost: {cost}")


## 11. Product Quality Prediction
**What is it?**

We want to predict which products are likely to have quality issues.

**How we do it:**

- Use farm and production data to predict quality.
- Models like classification or regression help with quality predictions.

In [None]:
from sklearn.tree import DecisionTreeClassifier

# Features: Farm size, number of cows, quantity produced
X = dairy_data[['Farm Size', 'Number of Cows', 'Quantity (liters/kg)']]
y = dairy_data['Quality']  # Placeholder, you'd need a quality metric

# Train a decision tree classifier
model = DecisionTreeClassifier()
model.fit(X, y)

# Predict product quality
new_data = [[2, 100, 500]]  # Example new farm data
predicted_quality = model.predict(new_data)
print(f"Predicted quality: {predicted_quality}")


## 12. Brand Performance and Market Share Analysis
**What is it?**

We want to compare how different brands are doing in terms of sales and market share.

**How we do it:**

- Use brand data, product sales, and customer locations.
- Clustering or classification models help compare brand performance.

In [None]:
X = dairy_data[['Brand', 'Approx. Total Revenue(INR)']]

# Apply K-means clustering
kmeans = KMeans(n_clusters=3)
dairy_data['Brand Segment'] = kmeans.fit_predict(X)

# Check brand clusters
print(dairy_data[['Brand', 'Brand Segment']])


## 13. Predicting Customer Churn
**What is it?**

We want to predict which customers are likely to stop buying from us.

**How we do it:**

- Analyze customer purchase history and preferences.
- Classification models like decision trees help predict churn.

In [None]:
X = dairy_data[['Sales Channel', 'Quantity Sold (liters/kg)', 'Approx. Total Revenue(INR)']]
y = dairy_data['Churn']  # Placeholder for churn column

# Train a decision tree model
model = DecisionTreeClassifier()
model.fit(X, y)

# Predict churn for new customer data
new_data = [[1, 200, 15000]]  # Example new customer data
churn_prediction = model.predict(new_data)
print(f"Churn prediction: {churn_prediction}")


## 14. Sustainability and Resource Utilization
**What is it?**

We want to use land and resources efficiently to reduce waste and environmental impact.

**How we do it:**

- Use data like land area, farm size, and production quantity.
- Regression and clustering models help analyze resource use.

In [None]:
X = dairy_data[['Total Land Area (acres)', 'Number of Cows']]
y = dairy_data['Quantity (liters/kg)']

# Train a linear regression model
model = LinearRegression()
model.fit(X, y)

# Predict production for new farm data
new_data = [[500, 100]]  # Example new farm size and cow count
predicted_production = model.predict(new_data)
print(f"Predicted production: {predicted_production}")
