# AI Tools Assignment: Mastering the AI Toolkit 🛠️🧠

This notebook contains the implementation of various AI tasks using different frameworks and tools. The assignment is divided into three main parts:
1. Theoretical Understanding
2. Practical Implementation
3. Ethics & Optimization

## Setup
First, let's install and import all the required libraries.

In [None]:
# Install required packages
!pip install tensorflow torch scikit-learn spacy pandas numpy matplotlib seaborn streamlit flask tensorflow-model-analysis

: 

In [None]:
# Import required libraries
import numpy as np
import pandas as pd
import tensorflow as tf
import torch
import torch.nn as nn
import torch.optim as optim
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, precision_score, recall_score
from sklearn.preprocessing import LabelEncoder
import spacy
import matplotlib.pyplot as plt
import seaborn as sns

# Set random seeds for reproducibility
np.random.seed(42)
tf.random.set_seed(42)
torch.manual_seed(42)

# Part 1: Theoretical Understanding

## Short Answer Questions

### Q1: TensorFlow vs PyTorch
Explain the primary differences between TensorFlow and PyTorch, and when you would choose one over the other.

[Your answer here]

### Q2: Jupyter Notebooks Use Cases
Describe two use cases for Jupyter Notebooks in AI development.

[Your answer here]

### Q3: spaCy's NLP Capabilities
How does spaCy enhance NLP tasks compared to basic Python string operations?

[Your answer here]

## Comparative Analysis

Compare Scikit-learn and TensorFlow in terms of:

| Feature | Scikit-learn | TensorFlow |
|---------|--------------|------------|
| Target applications | | |
| Ease of use for beginners | | |
| Community support | | |

[Fill in the comparison table above with your analysis]

# Part 2: Practical Implementation

## Task 1: Classical ML with Scikit-learn - Iris Dataset

In this task, we will:
1. Load and preprocess the Iris dataset
2. Handle any missing values
3. Encode labels
4. Train a decision tree classifier
5. Evaluate the model using accuracy, precision, and recall

In [None]:
# Load the Iris dataset
iris = load_iris()
X = pd.DataFrame(iris.data, columns=iris.feature_names)
y = iris.target

# Check for missing values
print("Missing values in the dataset:")
print(X.isnull().sum())

# Create train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

print("\nDataset shapes:")
print(f"Training set: {X_train.shape}")
print(f"Test set: {X_test.shape}")

In [None]:
# Train Decision Tree Classifier
clf = DecisionTreeClassifier(random_state=42)
clf.fit(X_train, y_train)

# Make predictions
y_pred = clf.predict(X_test)

# Calculate metrics
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred, average='weighted')
recall = recall_score(y_test, y_pred, average='weighted')

print("Model Performance Metrics:")
print(f"Accuracy: {accuracy:.4f}")
print(f"Precision: {precision:.4f}")
print(f"Recall: {recall:.4f}")

# Visualize feature importance
feature_importance = pd.DataFrame({
    'feature': iris.feature_names,
    'importance': clf.feature_importances_
})
plt.figure(figsize=(10, 6))
sns.barplot(data=feature_importance, x='importance', y='feature')
plt.title('Feature Importance in Iris Classification')
plt.show()

## Task 2: Deep Learning with TensorFlow/PyTorch - MNIST Classification

In this task, we will:
1. Load and preprocess the MNIST dataset
2. Build a CNN model
3. Train the model to achieve >95% test accuracy
4. Visualize predictions on sample images

We'll use TensorFlow for this implementation.

In [None]:
# Load and preprocess MNIST dataset
(X_train, y_train), (X_test, y_test) = tf.keras.datasets.mnist.load_data()

# Normalize pixel values
X_train = X_train.astype('float32') / 255.0
X_test = X_test.astype('float32') / 255.0

# Reshape for CNN (add channel dimension)
X_train = X_train.reshape(-1, 28, 28, 1)
X_test = X_test.reshape(-1, 28, 28, 1)

print("Dataset shapes:")
print(f"Training set: {X_train.shape}")
print(f"Test set: {X_test.shape}")

In [None]:
# Define the CNN model
model = tf.keras.Sequential([
    tf.keras.layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)),
    tf.keras.layers.MaxPooling2D((2, 2)),
    tf.keras.layers.Conv2D(64, (3, 3), activation='relu'),
    tf.keras.layers.MaxPooling2D((2, 2)),
    tf.keras.layers.Conv2D(64, (3, 3), activation='relu'),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.Dense(10, activation='softmax')
])

# Compile the model
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

# Train the model
history = model.fit(X_train, y_train, epochs=10,
                    validation_data=(X_test, y_test),
                    batch_size=128)

# Evaluate the model
test_loss, test_accuracy = model.evaluate(X_test, y_test)
print(f"\nTest accuracy: {test_accuracy:.4f}")

# Plot training history
plt.figure(figsize=(12, 4))

plt.subplot(1, 2, 1)
plt.plot(history.history['accuracy'], label='Training Accuracy')
plt.plot(history.history['val_accuracy'], label='Validation Accuracy')
plt.title('Model Accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.legend()

plt.subplot(1, 2, 2)
plt.plot(history.history['loss'], label='Training Loss')
plt.plot(history.history['val_loss'], label='Validation Loss')
plt.title('Model Loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()

plt.tight_layout()
plt.show()

In [None]:
# Select 5 random test images
n_samples = 5
sample_indices = np.random.choice(len(X_test), n_samples, replace=False)
sample_images = X_test[sample_indices]
sample_labels = y_test[sample_indices]

# Get predictions
predictions = model.predict(sample_images)
predicted_labels = np.argmax(predictions, axis=1)

# Plot the results
plt.figure(figsize=(15, 3))
for i in range(n_samples):
    plt.subplot(1, n_samples, i + 1)
    plt.imshow(sample_images[i].reshape(28, 28), cmap='gray')
    plt.title(f'True: {sample_labels[i]}\nPred: {predicted_labels[i]}')
    plt.axis('off')
plt.tight_layout()
plt.show()

## Task 3: NLP with spaCy - Named Entity Recognition and Sentiment Analysis

In this task, we will:
1. Load and process Amazon product reviews
2. Perform Named Entity Recognition (NER) to extract product names and brands
3. Implement a rule-based sentiment analysis
4. Display extracted entities and sentiment results

In [None]:
# Download spaCy English model
!python -m spacy download en_core_web_sm

# Sample Amazon reviews data
reviews = [
    "The Apple iPhone 13 Pro has an amazing camera and great battery life. Highly recommend!",
    "This Samsung TV is terrible. The picture quality is poor and customer service is awful.",
    "Nike running shoes are comfortable and durable. Perfect for long distance running.",
    "The Sony headphones have excellent sound quality but they're a bit expensive.",
    "Dell laptop keeps crashing. Waste of money, very disappointed with the product."
]

# Load spaCy model
nlp = spacy.load("en_core_web_sm")

# Function to extract entities
def extract_entities(text):
    doc = nlp(text)
    entities = [(ent.text, ent.label_) for ent in doc.ents]
    return entities

# Simple rule-based sentiment analysis
def analyze_sentiment(text):
    positive_words = ['amazing', 'great', 'excellent', 'good', 'perfect', 'comfortable', 'recommend']
    negative_words = ['terrible', 'poor', 'awful', 'waste', 'disappointed', 'bad', 'crashes']
    
    text = text.lower()
    positive_score = sum([1 for word in positive_words if word in text])
    negative_score = sum([1 for word in negative_words if word in text])
    
    if positive_score > negative_score:
        return 'Positive'
    elif negative_score > positive_score:
        return 'Negative'
    else:
        return 'Neutral'

# Process reviews
results = []
for review in reviews:
    entities = extract_entities(review)
    sentiment = analyze_sentiment(review)
    results.append({
        'review': review,
        'entities': entities,
        'sentiment': sentiment
    })

# Display results
for result in results:
    print(f"\nReview: {result['review']}")
    print(f"Entities: {result['entities']}")
    print(f"Sentiment: {result['sentiment']}")
    print("-" * 80)

# Part 3: Ethics & Optimization

## Ethical Considerations in AI Models

### MNIST Model Bias Analysis
1. Potential biases in the MNIST dataset:
   - Limited handwriting styles and variations
   - Possible underrepresentation of different cultural writing styles
   - Bias towards certain digit writing conventions

2. Mitigating biases:
   - Data augmentation to increase variation
   - Using TensorFlow Fairness Indicators for model evaluation
   - Regular model monitoring and retraining

### Amazon Reviews Model Bias Analysis
1. Potential biases in sentiment analysis:
   - Language and cultural biases in sentiment words
   - Context-dependent interpretations
   - Brand name recognition limitations

2. Mitigating biases:
   - Using multilingual models
   - Implementing context-aware analysis
   - Regular updates to sentiment dictionaries

In [None]:
# Example of using TensorFlow Fairness Indicators for MNIST
import tensorflow_model_analysis as tfma

# Function to create a simple fairness evaluation
def evaluate_model_fairness(model, test_data, test_labels):
    predictions = model.predict(test_data)
    predicted_labels = np.argmax(predictions, axis=1)
    
    # Calculate accuracy per class
    class_accuracies = []
    for digit in range(10):
        mask = test_labels == digit
        class_acc = accuracy_score(test_labels[mask], predicted_labels[mask])
        class_accuracies.append(class_acc)
        print(f"Accuracy for digit {digit}: {class_acc:.4f}")
    
    # Calculate fairness metrics
    min_acc = min(class_accuracies)
    max_acc = max(class_accuracies)
    acc_disparity = max_acc - min_acc
    
    print(f"\nFairness Metrics:")
    print(f"Minimum class accuracy: {min_acc:.4f}")
    print(f"Maximum class accuracy: {max_acc:.4f}")
    print(f"Accuracy disparity: {acc_disparity:.4f}")
    
    return class_accuracies

# Evaluate fairness on MNIST model
class_accuracies = evaluate_model_fairness(model, X_test, y_test)

# Visualize class-wise accuracies
plt.figure(figsize=(10, 6))
plt.bar(range(10), class_accuracies)
plt.title('Accuracy Distribution Across Digits')
plt.xlabel('Digit')
plt.ylabel('Accuracy')
plt.show()

# Bonus Task: Deploy MNIST Classifier with Streamlit

In this section, we'll create a simple web interface for our MNIST classifier using Streamlit. This will allow users to draw digits and get real-time predictions from our model.