## Random Forest Classifier

* In classification, a random forest algorithm builds multiple decision trees and combines their predictions to obtain the final classification result.
* Each tree is constructed based on a random subset of the training data and a random subset of the features in the data.
* The final prediction is made by aggregating the predictions of all the individual trees in the forest. 

### Import Libraries

* <b>Flask</b>: a micro web framework for building web applications using Python.
* <b>numpy</b>: For numerical computing with support for large, multi-dimensional arrays and matrices.
* <b>scikit-learn</b>: Machine learning library for Python with tools for classification, regression, clustering, and dimensionality reduction.
* <b>PIL (Python Imaging Library)</b>: Adds support for opening, manipulating, and saving different image file formats.
* <b>os</b>: Interact with the operating system, including reading and writing files, managing processes, and setting environment variables
* <b>Pickle</b>: Provides a way to serialize and deserialize Python objects, allowing them to be saved to and loaded from files.

In [18]:
from flask import Flask, render_template, request
import numpy as np
from PIL import Image
from sklearn.ensemble import RandomForestClassifier
import os
import pickle

### Path to training and test sets

<b>Dataset used</b>: [CIFAKE: Real and AI-Generated Synthetic Images](https://www.kaggle.com/datasets/birdy654/cifake-real-and-ai-generated-synthetic-images)

In [2]:
train_path = 'C:/Users/aadit/anaconda3/111Test/CIFAKE/train'
test_path = 'C:/Users/aadit/anaconda3/111Test/CIFAKE/test'

### Classes for binary classification

In [3]:
classes = ['real', 'fake']

### Lists to store the image data and labels

In [4]:
train_data = []
train_labels = []
test_data = []
test_labels = []


### Read in the training data

In [5]:
for c in classes:
    path = os.path.join(train_path, c)
    for image_file in os.listdir(path):
        image_path = os.path.join(path, image_file)
        image = Image.open(image_path).convert('L') # convert to grayscale
        image = image.resize((32, 32)) # resize the image to 32x32
        image_data = np.asarray(image).flatten() # flatten the image data
        train_data.append(image_data)
        train_labels.append(classes.index(c))

### Read in the test data

In [6]:
for c in classes:
    path = os.path.join(test_path, c)
    for image_file in os.listdir(path):
        image_path = os.path.join(path, image_file)
        image = Image.open(image_path).convert('L') # convert to grayscale
        image = image.resize((32, 32)) # resize the image to 32x32
        image_data = np.asarray(image).flatten() # flatten the image data
        test_data.append(image_data)
        test_labels.append(classes.index(c))

### Train the model

In [7]:
rf = RandomForestClassifier(n_estimators=100, random_state=0)
rf.fit(train_data, train_labels)


RandomForestClassifier(random_state=0)

### Accuracy

In [8]:
accuracy = rf.score(test_data, test_labels)
print("Accuracy: {:.2f}%".format(accuracy*100))

Accuracy: 81.22%


### Create pickle file

In [19]:
with open('random_forest_classifier.pkl', 'wb') as f:
    pickle.dump(rf, f)
