To complete **Task II**, you need to follow a structured **AI workflow** and implement it in a **Jupyter Notebook**. Below, I’ll break down each step in detail, including how to handle the data, choose algorithms, implement them, and organize your code for maximum marks.

---

## **Step 1: Understand the Task**
- **Goal**: Re-identify individual animals (pigs, koi fish, pigeons) from video or image data using AI.
- **Key Challenges**:
  - Multiple animals in the same frame.
  - Occlusion (animals blocking each other).
  - Intra-class variability (animals of the same species look similar).
- **Deliverables**:
  - A Jupyter Notebook with a clear AI workflow.
  - Well-organized, readable, and documented Python code.
  - Use of markdown to explain each step.

---

## **Step 2: Set Up Your Environment**
1. **Install Required Libraries**:
   ```bash
   pip install tensorflow keras ultralytics keras-cv opencv-python matplotlib numpy pandas scikit-learn
   ```
2. **Organize Your Notebook**:
   - Use markdown cells to create sections (e.g., Data Loading, Preprocessing, Model Training, Evaluation).
   - Add comments in code cells to explain each step.

---

## **Step 3: Data Handling**
### **1. Load the Dataset**
- Download the dataset (videos, annotated videos, images, and features).
- Use Python libraries like `os`, `cv2` (OpenCV), and `pandas` to load and explore the data.

```python
import os
import cv2
import pandas as pd

# Load video frames
video_path = "path_to_video.mp4"
cap = cv2.VideoCapture(video_path)
frames = []
while cap.isOpened():
    ret, frame = cap.read()
    if not ret:
        break
    frames.append(frame)
cap.release()

# Load annotations
annotations = pd.read_csv("path_to_annotations.csv")
```

### **2. Preprocess the Data**
- Resize images to a consistent size (e.g., 224x224 for pretrained models).
- Normalize pixel values (e.g., scale to [0, 1]).
- Split the dataset into training and testing sets, ensuring temporal continuity (e.g., first half for training, second half for testing).

```python
from sklearn.model_selection import train_test_split

# Example: Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(frames, annotations, test_size=0.2, random_state=42)
```

---

## **Step 4: Choose and Implement Models**
### **Option 1: Pretrained Models (VGG16, ResNet-50, etc.)**
- Use these models for **feature extraction** or **fine-tuning**.
- Example: Extract features using VGG16 and train a classifier (e.g., SVM) for re-identification.

```python
from tensorflow.keras.applications import VGG16
from tensorflow.keras.layers import Dense, Flatten
from tensorflow.keras.models import Model

# Load VGG16 without the top layer
base_model = VGG16(weights='imagenet', include_top=False, input_shape=(224, 224, 3))

# Add custom layers for classification
x = Flatten()(base_model.output)
x = Dense(128, activation='relu')(x)
output = Dense(num_classes, activation='softmax')(x)  # num_classes = number of animals
model = Model(base_model.input, output)

# Compile and train the model
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
model.fit(X_train, y_train, epochs=10, batch_size=8)
```

### **Option 2: Object Detection (YOLOv8, KerasCV)**
- Use YOLOv8 or KerasCV to detect animals in frames.
- Extract detected regions and use them for re-identification.

```python
from ultralytics import YOLO

# Load YOLOv8 model
model = YOLO("yolov8n.pt")

# Detect animals in a frame
results = model("path_to_image.jpg")

# Visualize results
results.show()
```

### **Option 3: Image Segmentation (U-Net)**
- Use U-Net to segment animals from the background.
- Example: Train U-Net on annotated images to produce pixel-level segmentation maps.

```python
# Define and train U-Net (see earlier example)
```

### **Option 4: Similarity Learning (Siamese Networks)**
- Use Siamese networks with triplet loss for re-identification.
- Example: Train a Siamese network to learn a similarity metric between animal images.

```python
# Define and train Siamese network (see earlier example)
```

---

## **Step 5: Evaluate the Model**
- Use metrics like **accuracy**, **precision**, **recall**, or **F1-score** to evaluate your model.
- Visualize results (e.g., bounding boxes, segmentation maps, or re-identification examples).

```python
from sklearn.metrics import classification_report

# Evaluate the model
y_pred = model.predict(X_test)
print(classification_report(y_test, y_pred))
```

---

## **Step 6: Document and Organize Your Notebook**
1. **Markdown Cells**:
   - Use markdown to explain each step (e.g., data loading, preprocessing, model training, evaluation).
   - Add headings, subheadings, and bullet points for clarity.
2. **Code Comments**:
   - Add comments to explain each line of code.
3. **Visualizations**:
   - Include plots, images, or graphs to illustrate your results.
4. **Conclusion**:
   - Summarize your findings, challenges, and potential improvements.

---

## **Step 7: Submit Your Work**
- Save your Jupyter Notebook and submit it to Canvas.
- Ensure your notebook is well-organized, readable, and follows best practices.

---

## **Example Notebook Structure**
```markdown
# Animal Re-Identification Using AI

## 1. Introduction
- Explain the task and dataset.

## 2. Data Loading
- Load videos, images, and annotations.

## 3. Data Preprocessing
- Resize, normalize, and split the data.

## 4. Model Implementation
- Choose and implement models (e.g., VGG16, YOLOv8, U-Net, Siamese Network).

## 5. Model Evaluation
- Evaluate the model using metrics and visualizations.

## 6. Conclusion
- Summarize results and discuss challenges.
```

---

By following these steps, you’ll create a high-quality Jupyter Notebook that demonstrates a clear AI workflow, well-implemented algorithms, and best coding practices. Let me know if you need further assistance!

In [1]:
import pandas as pd
import os

In [2]:
os.listdir("task2_data/features/")

['h2_Koi_5652_952_540_MN2.csv',
 'h2_Pigeons_8234_1280_720_AE.csv',
 'h1_Pigeons_29033_960_540_300f_LBP.csv',
 'h2_Pigeons_8234_1280_720_LBP.csv',
 'h2_Pigeons_4927_960_540_600f_LBP.csv',
 'h1_Pigeons_8234_1280_720_RGB.csv',
 'h1_Pigeons_29033_960_540_300f_HOG.csv',
 'h1_Pigs_49651_960_540_500f_MN2.csv',
 'h2_Pigeons_4927_960_540_600f_HOG.csv',
 'h1_Koi_5652_952_540_RGB.csv',
 'h2_Pigeons_8234_1280_720_HOG.csv',
 'h1_Koi_5652_952_540_AE.csv',
 'h1_Pigeons_4927_960_540_600f_MN2.csv',
 'h2_Pigs_49651_960_540_500f_LBP.csv',
 'h1_Pigeons_8234_1280_720_AE.csv',
 'h1_Pigeons_29033_960_540_300f_AE.csv',
 'h2_Pigs_49651_960_540_500f_HOG.csv',
 'h2_Pigeons_29033_960_540_300f_RGB.csv',
 'h2_Koi_5652_952_540_HOG.csv',
 'h2_Pigeons_29033_960_540_300f_AE.csv',
 'h1_Pigs_49651_960_540_500f_LBP.csv',
 'h1_Pigeons_29033_960_540_300f_MN2.csv',
 'h2_Koi_5652_952_540_LBP.csv',
 'h2_Pigs_49651_960_540_500f_AE.csv',
 'h2_Koi_5652_952_540_AE.csv',
 'h1_Pigs_49651_960_540_500f_HOG.csv',
 'h2_Pigeons_4927_960

In [3]:
# Check to see how many rows and columns are in each CSV file

import os
import pandas as pd

# Specify the folder path
folder_path = "task2_data/features/"

# List CSV files and sort alphabetically
csv_files = sorted([f for f in os.listdir(folder_path) if f.endswith(".csv")])

lines = 0
# Iterate through CSV files and check the shape (rows, columns)
for file_name in csv_files:
    file_path = os.path.join(folder_path, file_name)
    
    try:
        # Read the CSV into a DataFrame
        df = pd.read_csv(file_path)
        
        # Get the number of rows and columns
        rows, columns = df.shape
        
        # Print the result
        print(f"{file_name}: Rows = {rows}, Columns = {columns}")
    
    except Exception as e:
        print(f"Error reading {file_name}: {e}")

    lines += 1
    if lines == 5:
        print("\n")
        lines = 0


h1_Koi_5652_952_540_AE.csv: Rows = 916, Columns = 11
h1_Koi_5652_952_540_HOG.csv: Rows = 916, Columns = 577
h1_Koi_5652_952_540_LBP.csv: Rows = 916, Columns = 11
h1_Koi_5652_952_540_MN2.csv: Rows = 916, Columns = 1281
h1_Koi_5652_952_540_RGB.csv: Rows = 916, Columns = 55


h1_Pigeons_29033_960_540_300f_AE.csv: Rows = 2148, Columns = 11
h1_Pigeons_29033_960_540_300f_HOG.csv: Rows = 2148, Columns = 577
h1_Pigeons_29033_960_540_300f_LBP.csv: Rows = 2148, Columns = 11
h1_Pigeons_29033_960_540_300f_MN2.csv: Rows = 2148, Columns = 1281
h1_Pigeons_29033_960_540_300f_RGB.csv: Rows = 2148, Columns = 55


h1_Pigeons_4927_960_540_600f_AE.csv: Rows = 1574, Columns = 11
h1_Pigeons_4927_960_540_600f_HOG.csv: Rows = 1574, Columns = 577
h1_Pigeons_4927_960_540_600f_LBP.csv: Rows = 1574, Columns = 11
h1_Pigeons_4927_960_540_600f_MN2.csv: Rows = 1574, Columns = 1281
h1_Pigeons_4927_960_540_600f_RGB.csv: Rows = 1574, Columns = 55


h1_Pigeons_8234_1280_720_AE.csv: Rows = 2268, Columns = 11
h1_Pigeons_823

Each file has the same number of columns for each feature i.e. AE feature file has 11 columns for each animal

In [4]:
# import os
# import pandas as pd

# # Specify the folder path
# folder_path = "task2_data/features/"

# # Get all CSV files in the folder
# csv_files = [f for f in os.listdir(folder_path) if f.endswith(".csv")]

# # Iterate through CSV files and check the shape (rows, columns)
# for file_name in csv_files:
#     file_path = os.path.join(folder_path, file_name)
    
#     # Read the CSV into a DataFrame
#     try:
#         df = pd.read_csv(file_path)
        
#         # Get the number of rows and columns
#         rows, columns = df.shape
        
#         print(f"{file_name}: Rows = {rows}, Columns = {columns}")
#     except Exception as e:
#         print(f"Error reading {file_name}: {e}")


In [5]:
species_names = ["Koi_5652_952_540", "Pigeons_4927_960_540_600f", "Pigeons_8234_1280_720", "Pigeons_29033_960_540_300f", "Pigs_49651_960_540_500f"]

# Some info about the data
Basically, the frames dataframe has info about size of the frame and the name of the animals in one specific frame 

The images folder contains the specific image that each animal appears in (thats mentioned in the frames folder)

The features folder contains the info about features (so each row represents 1 animal and the features for it (probably in the first frame))

In [6]:
frames_file_path = "task2_data/frames/"
features_file_path = "task2_data/features/"

In [7]:
features = ['AE', 'HOG', 'LBP', 'MN2', 'RGB']

In [8]:
# import os

# # Folder path
# # folder_path = "/path/to/your/folder"
# search_string = "Koi_5652_952_540"

# # List to store matching file paths
# matching_files = []

# # Iterate through all files in the folder
# for file_name in os.listdir(features_file_path):
#     if file_name.endswith(".csv") and search_string in file_name and 'AE' in file_name:
#         file_path = os.path.join(features_file_path, file_name)
#         matching_files.append(file_path)

# print(matching_files)

In [9]:
# Find files with the animal name

In [10]:
# import os

# # Folder path
# # folder_path = "/path/to/your/folder"
# search_string = "Koi_5652_952_540"
# features = ['AE']
# # features = ['AE', 'HOG', 'LBP', 'MN2', 'RGB']

# # List to store matching file paths
# matching_files = []

# # Iterate through all files in the folder
# for file_name in os.listdir(features_file_path):
#     for feature in features:
#         if file_name.endswith(".csv") and search_string in file_name and feature in file_name:
#             file_path = os.path.join(features_file_path, file_name)
#             matching_files.append(file_path)

# print(matching_files)

In [11]:
# empty_df = pd.DataFrame()
# for csv_path in matching_files:
#     df = pd.read_csv(csv_path)
#     empty_df = pd.concat([empty_df, df], ignore_index=True)

In [12]:
import os

search_string = "h1_Koi_5652_952_540"
features = ['AE', 'HOG', 'LBP', 'MN2', 'RGB']

# List to store matching file paths
matching_files = []

first_iteration = True
# Iterate through all files in the folder
for file_name in os.listdir(features_file_path):
    for feature in features:
        if file_name.endswith(".csv") and search_string in file_name and feature in file_name:
            file_path = os.path.join(features_file_path, file_name)
            matching_files.append(file_path)
            if first_iteration:
                empty_df = pd.DataFrame()
                for csv_path in matching_files:
                    df = pd.read_csv(csv_path)
                    empty_df = pd.concat([empty_df, df], ignore_index=True)
                first_iteration = False
            else:
                for csv_path in matching_files:
                    df = pd.read_csv(csv_path)
                    empty_df = pd.concat([empty_df, df], axis=1)

In [13]:
empty_df

Unnamed: 0,RGB_0,RGB_1,RGB_2,RGB_3,RGB_4,RGB_5,RGB_6,RGB_7,RGB_8,RGB_9,...,HOG_567,HOG_568,HOG_569,HOG_570,HOG_571,HOG_572,HOG_573,HOG_574,HOG_575,Labels
0,92.656283,30.075313,128.165164,28.226753,159.806734,19.300880,118.626060,62.803117,142.892895,48.195982,...,0.061997,0.037025,0.046570,0.095946,0.253039,0.249480,0.183246,0.191079,0.132782,3
1,127.844294,76.359100,139.349130,47.587507,164.492263,35.376289,111.109684,63.062812,127.634881,35.596722,...,0.298187,0.298187,0.014142,0.039491,0.298187,0.255253,0.146513,0.143696,0.161387,5
2,118.626523,60.570514,145.617563,48.800228,172.239247,34.897283,168.047696,49.544111,193.491780,44.145432,...,0.124098,0.227924,0.247540,0.247540,0.059532,0.025870,0.094370,0.146290,0.081354,6
3,78.643852,9.305123,115.772336,8.541012,155.624385,9.843467,74.183232,11.678523,111.048577,8.355872,...,0.182955,0.188289,0.104970,0.061411,0.052850,0.047572,0.040652,0.079280,0.126164,8
4,95.963400,31.316742,131.246521,29.415611,161.665910,19.432514,100.626377,50.740002,130.783813,40.745177,...,0.033997,0.046893,0.044863,0.104401,0.251770,0.251770,0.222159,0.083488,0.072167,3
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
911,135.133903,53.611770,157.318274,40.919168,177.519740,23.488434,204.845635,36.386729,195.401984,34.386114,...,0.057231,0.029473,0.023948,0.033575,0.039762,0.039968,0.022297,0.018267,0.038727,1
912,87.396205,1.293988,124.466518,1.473743,168.726283,2.019752,94.556818,22.405614,127.275852,11.056228,...,0.068071,0.147745,0.058853,0.061582,0.069138,0.050976,0.103027,0.212321,0.132553,2
913,84.670397,1.756491,126.427018,1.691956,166.759931,1.762654,123.675379,53.981933,156.971079,43.468853,...,0.012887,0.011578,0.056723,0.130149,0.287459,0.080640,0.033522,0.022872,0.022063,9
914,87.379929,3.080394,124.309091,3.028069,168.058560,3.882591,108.644864,37.976834,132.858796,11.292119,...,0.026960,0.018131,0.052725,0.078077,0.053549,0.070032,0.089480,0.123514,0.029186,2


In [14]:
11+577+11+1281+55

1935

In [15]:
for species in species_names:
    print(species)

Koi_5652_952_540
Pigeons_4927_960_540_600f
Pigeons_8234_1280_720
Pigeons_29033_960_540_300f
Pigs_49651_960_540_500f


In [16]:
df_list = []
species = "h1_Koi_5652_952_540"
# List CSV files and sort alphabetically
csv_files = sorted([f for f in os.listdir(folder_path) if f.endswith(".csv")])
for csv_file in csv_files:
    if species in csv_file:
        print(csv_file)
        animal_df = pd.read_csv(f"{features_file_path}/{csv_file}")
        df_list.append(animal_df)
# Concatenate all DataFrames horizontally (side by side)
combined_df = pd.concat(df_list, axis=1)

# Print the combined DataFrame
combined_df

h1_Koi_5652_952_540_AE.csv
h1_Koi_5652_952_540_HOG.csv
h1_Koi_5652_952_540_LBP.csv
h1_Koi_5652_952_540_MN2.csv
h1_Koi_5652_952_540_RGB.csv


Unnamed: 0,AE_0,AE_1,AE_2,AE_3,AE_4,AE_5,AE_6,AE_7,AE_8,AE_9,...,RGB_45,RGB_46,RGB_47,RGB_48,RGB_49,RGB_50,RGB_51,RGB_52,RGB_53,Labels
0,0.382822,0.004994,0.227260,0.006099,0.014593,0.001970,0.093760,0.002686,0.149170,0.006581,...,49.957177,193.921294,39.134372,138.728980,74.953124,162.637388,59.693482,187.732478,43.428720,3
1,0.001117,0.002207,0.566851,0.013512,0.995071,0.103717,0.003387,0.005848,0.999922,0.674592,...,31.067555,134.383575,31.284545,197.848611,69.266967,152.619907,41.478282,151.576389,42.871045,5
2,0.012490,0.092357,0.005952,0.013071,0.032254,0.011199,0.421082,0.033926,0.005940,0.012922,...,32.311817,159.852775,23.791283,158.988522,70.893656,174.354261,57.925522,192.777217,44.033565,6
3,0.016480,0.019402,0.003902,0.014737,0.021675,0.051318,0.010404,0.005301,0.053584,0.010807,...,61.966819,199.524340,46.475302,74.881160,5.270035,111.883770,5.236539,150.326809,5.399256,8
4,0.460041,0.002317,0.186082,0.011455,0.011283,0.001504,0.555945,0.002121,0.080146,0.004765,...,48.394256,194.109333,37.883475,147.378689,73.690020,168.752214,59.434738,192.958383,42.708400,3
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
911,0.526105,0.835154,0.479839,0.114679,0.097497,0.225052,0.016624,0.010848,0.010602,0.400713,...,42.121156,203.198828,21.818927,94.953906,20.220073,131.257812,16.188150,173.177734,9.529456,1
912,0.003246,0.003687,0.091655,0.008521,0.003879,0.002877,0.003848,0.000777,0.880966,0.003632,...,38.608198,130.449510,50.224801,164.023752,57.659006,148.968151,37.768979,156.900945,52.071492,2
913,0.007441,0.020624,0.006735,0.014043,0.003728,0.014027,0.029215,0.010190,0.067186,0.007475,...,37.025112,198.192361,32.044099,119.756404,50.783510,150.815156,40.768752,182.388096,26.817174,9
914,0.003308,0.005175,0.206821,0.006885,0.001692,0.006112,0.042922,0.000180,0.221735,0.013784,...,31.597794,146.579021,43.381658,128.244963,48.171764,141.439789,24.820114,168.852106,34.276541,2


In [None]:
species2 = "h2_Koi_5652_952_540"
