### Assume you have a working ML model that can process individual images and identify carrots, how would you adapt that model such that you could feed it live video inside a grocery store and have it create a record of any carrots it sees.

To adapt the model for live video processing in a grocery store, the input pipeline must be reconfigured to handle video input streams instead of static images. This adjustment allows the model to continuously analyze each frame of the video feed to identify carrots. It's essential for the model to differentiate between frames to avoid counting the same carrot multiple times. When a carrot is identified, the model should create a data record containing its location within the frame and any other relevant details, such as size or orientation. This approach ensures that the model can accurately track carrots in real-time video, providing a detailed record of their presence and locations within the grocery store.

# Toy example 

I implemented a toy machine learning concept to demonstrate basic classification techniques. The implementation compares the performance of several classifiers, including Random Forest, Extra Trees, Decision Tree, and LightGBM, on a given dataset. Importantly, this is a surface test of the models' performance without any parameter tuning. The evaluation metrics used are accuracy, precision, and F1-score. This toy implementation provides a basic understanding of how different classifiers perform on classification problems.

In [33]:
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt

# Read the CSV file into a DataFrame
df = pd.read_csv(r"data.csv")
# Display the first few rows of the DataFrame to inspect the data
df.head()


Unnamed: 0,Loan_ID,Gender,Married,Dependents,Education,Self_Employed,ApplicantIncome,CoapplicantIncome,LoanAmount,Loan_Amount_Term,Credit_History,Property_Area,Loan_Status
0,LP001002,Male,No,0,Graduate,No,5849,0.0,,360.0,1.0,Urban,Y
1,LP001003,Male,Yes,1,Graduate,No,4583,1508.0,128.0,360.0,1.0,Rural,N
2,LP001005,Male,Yes,0,Graduate,Yes,3000,0.0,66.0,360.0,1.0,Urban,Y
3,LP001006,Male,Yes,0,Not Graduate,No,2583,2358.0,120.0,360.0,1.0,Urban,Y
4,LP001008,Male,No,0,Graduate,No,6000,0.0,141.0,360.0,1.0,Urban,Y


In [34]:
# Create a subset of the DataFrame where 'Loan_Status' is 'Y', sampling 89 rows
Loan_Status = df.loc[df['Loan_Status'] == 'Y'].sample(n=89, random_state=42)
# Create a subset of the DataFrame where 'Loan_Status' is 'N'
noLoan_Status = df.loc[df['Loan_Status'] == 'N']
# Concatenate the two subsets to create a new DataFrame
df = pd.concat([Loan_Status, noLoan_Status])
# Separate the features (X) and the target variable (y)
X, y = df.drop('Loan_Status', axis=1), df['Loan_Status']


In [35]:
X = X.replace({'Credit_History': {1.0: 'Yes', 0.0:'No'}})
X = X.drop('Loan_ID',axis=1)
X.Credit_History.value_counts()

Credit_History
Yes    176
No      84
Name: count, dtype: int64

In [41]:
# Convert the target variable 'y' to a categorical type
y = y.astype('category')
# Convert the categorical variable 'y' to numerical codes
y = y.cat.codes


Married : 2
Dependents : 5
Education : 2
Self_Employed : 3
Credit_History : 3
Property_Area : 3


In [43]:
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler, OneHotEncoder, PowerTransformer

In [44]:
# Identify categorical features (columns) in the DataFrame 'X'
cat_feats = X.dtypes[X.dtypes == 'object'].index.tolist()
# Identify numerical features (columns) in the DataFrame 'X'
num_feats = X.dtypes[~X.dtypes.index.isin(cat_feats)].index.tolist()


In [45]:
# A custom transformer class that applies a function to a DataFrame
class DataframeFunctionTransformer():
    def __init__(self, func):
        self.func = func

    # Apply the function to the input DataFrame
    def transform(self, input_df, **transform_params):
        return self.func(input_df)

    # Fit method does nothing as the transformation does not require training
    def fit(self, X, y=None, **fit_params):
        return self


In [46]:
# Combine 'ApplicantIncome' and 'CoapplicantIncome', take the logarithm, and store the result in a new column
def combineAndLog(dataFrame):
    dataFrame["combinedIncomesLog"] = dataFrame['ApplicantIncome'] + dataFrame['CoapplicantIncome']
    dataFrame["combinedIncomesLog"] = dataFrame["combinedIncomesLog"].apply(lambda x: np.log(x))
    return dataFrame


In [47]:
# Apply logarithm transformation to 'LoanAmount' and store the result in a new column 'loanLog'
def logTransformer(dataFrame):
    dataFrame['loanLog'] = dataFrame["LoanAmount"].apply(lambda x: np.log(x))
    return dataFrame
# Import necessary library for splitting data into training and testing sets
from sklearn.model_selection import train_test_split
# Split the features (X) and target variable (y) into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.7, random_state=12)
# Import necessary libraries for preprocessing data
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import StandardScaler, OneHotEncoder, PowerTransformer
from sklearn.compose import ColumnTransformer

# Define a pipeline for numerical features
numeric_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='mean')),  # Impute missing values using mean
    ('scaler', PowerTransformer())  # Scale the features using PowerTransformer
])

# Define a pipeline for categorical features
categorical_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='most_frequent')),  # Impute missing values using most frequent value
    ('encoder', OneHotEncoder(handle_unknown='ignore'))  # Encode categorical features using OneHotEncoder
])

# Combine the transformers for numerical and categorical features
preprocessor = ColumnTransformer(
    transformers=[
        ('num', numeric_transformer, num_feats),  # Apply numeric transformer to numerical features
        ('cat', categorical_transformer, cat_feats)  # Apply categorical transformer to categorical features
    ],
    remainder='drop',  # Drop any remaining columns that are not transformed
    verbose=True
)

# Fit and transform the preprocessor on the training data
preprocessor.fit_transform(X_train, y_train)


[ColumnTransformer] ........... (1 of 2) Processing num, total=   0.0s
[ColumnTransformer] ........... (2 of 2) Processing cat, total=   0.0s


array([[-1.24028679, -1.06409999, -1.87641935, ...,  1.        ,
         0.        ,  0.        ],
       [-0.29071898, -1.06409999, -1.19528473, ...,  1.        ,
         0.        ,  0.        ],
       [ 1.45135659,  0.6837092 ,  1.00097993, ...,  0.        ,
         0.        ,  1.        ],
       ...,
       [-0.84733623, -1.06409999, -1.37695752, ...,  0.        ,
         0.        ,  1.        ],
       [-0.26284516, -1.06409999, -0.49596344, ...,  1.        ,
         0.        ,  0.        ],
       [ 1.39697213, -1.06409999, -1.1381979 , ...,  0.        ,
         0.        ,  1.        ]])

In [48]:
# Import necessary libraries for machine learning models
import lightgbm
from sklearn.ensemble import RandomForestClassifier, ExtraTreesClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.preprocessing import FunctionTransformer
from lightgbm import LGBMClassifier

# Define names and objects for different classifiers
model_names = ["RandomForestClassifier", "ExtraTreesClassifier", "DecisionTreeClassifier", "LGBMClassifier"]
models = [
    RandomForestClassifier(random_state=1),
    ExtraTreesClassifier(random_state=1),
    DecisionTreeClassifier(random_state=1),
    LGBMClassifier(random_state=1)
]
zipped_combo = zip(model_names, models)

# Import necessary metrics for evaluation
from sklearn.metrics import accuracy_score, precision_score, f1_score, make_scorer

# Define a function to fit a classifier pipeline and evaluate its performance
def fit_classifier(pipeline, x_train, y_train, x_test, y_test):
    model_fit = pipeline.fit(x_train, y_train)
    y_pred = model_fit.predict(x_test)
    modelAccuracy_score = accuracy_score(y_test, y_pred)
    print('accuracy_score:', accuracy_score(y_test, y_pred))
    print('precision:', precision_score(y_test, y_pred))
    print('f1_score:', f1_score(y_test, y_pred))
    print("\n")
    return modelAccuracy_score

# Define a function to test multiple classifiers and return their assessment
def classifier(classifier, X_train, y_train, X_test, y_test):
    assessment = []
    for model_Name, model in classifier:
        model_test_pipe = Pipeline([
            ('preprocessor', preprocessor),
            ('classifier', model)
        ])
        print('Name of model that is being tested: ', model_Name)
        modelTest = fit_classifier(model_test_pipe, X_train, y_train, X_test, y_test)
        assessment.append((model_Name, modelTest))
    return assessment


In [49]:
# Apply the classifier function to the zipped combo of model names and models, and the training and testing data
result = classifier(zipped_combo, X_train, y_train, X_test, y_test)


Name of model that is being tested:  RandomForestClassifier
[ColumnTransformer] ........... (1 of 2) Processing num, total=   0.0s
[ColumnTransformer] ........... (2 of 2) Processing cat, total=   0.0s
accuracy_score: 0.6705882352941176
precision: 0.5789473684210527
f1_score: 0.44


Name of model that is being tested:  ExtraTreesClassifier
[ColumnTransformer] ........... (1 of 2) Processing num, total=   0.0s
[ColumnTransformer] ........... (2 of 2) Processing cat, total=   0.0s
accuracy_score: 0.6705882352941176
precision: 0.5517241379310345
f1_score: 0.5333333333333333


Name of model that is being tested:  DecisionTreeClassifier
[ColumnTransformer] ........... (1 of 2) Processing num, total=   0.0s
[ColumnTransformer] ........... (2 of 2) Processing cat, total=   0.0s
accuracy_score: 0.6470588235294118
precision: 0.5151515151515151
f1_score: 0.53125


Name of model that is being tested:  LGBMClassifier
[ColumnTransformer] ........... (1 of 2) Processing num, total=   0.0s
[ColumnTra

found 0 physical cores < 1
  File "c:\Users\james\AppData\Local\Programs\Python\Python311\Lib\site-packages\joblib\externals\loky\backend\context.py", line 282, in _count_physical_cores
    raise ValueError(f"found {cpu_count_physical} physical cores < 1")
