 TensorFlow vs. PyTorch
TensorFlow uses static computation graphs more scalable, good for production while PyTorch uses dynamic computation graphs easier to debug, preferred for research

Use Cases of Jupyter Notebooks
1. Prototyping and testing AI/ML models interactively.
2. Creating shareable data science reports with code, outputs, and visualizations.


spaCy vs. Basic Python
- spaCy includes pre-trained models for tokenization, POS tagging, NER, and dependency parsing.
- Unlike basic string operations, spaCy understands linguistic structure, context, and semantics.


Comparative Analysis Table

| Feature               | Scikit-learn                    | TensorFlow                           |
| --------------------- | ------------------------------- | ------------------------------------ |
| **Focus**             | Classical ML (SVMs, trees, KNN) | Deep Learning (CNNs, RNNs, DNNs)     |
| **Beginner Friendly** | Very high                       | Moderate (TF 2.x easier than TF 1.x) |
| **Community Support** | Strong (esp. education)         | Very strong (industry + research)    |


In [1]:
# Task 1: Classical ML with Scikit-learn
# task1_scikit_iris.py

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score, precision_score, recall_score
import pandas as pd
import numpy as np

# Load dataset
iris = load_iris()
X = pd.DataFrame(iris.data, columns=iris.feature_names)
y = pd.Series(iris.target)

# Simulate missing values
# X.iloc[0, 0] = np.nan

# Handle missing values
X.fillna(X.mean(), inplace=True)  # Simple imputation

# Encode labels (already encoded as numbers in this dataset)

# Train/Test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize and train classifier
clf = DecisionTreeClassifier(random_state=42)
clf.fit(X_train, y_train)

# Predict
y_pred = clf.predict(X_test)

# Evaluate
print("Accuracy:", accuracy_score(y_test, y_pred))
print("Precision (macro):", precision_score(y_test, y_pred, average='macro'))
print("Recall (macro):", recall_score(y_test, y_pred, average='macro'))


Accuracy: 1.0
Precision (macro): 1.0
Recall (macro): 1.0


In [2]:
pip install tensorflow

Note: you may need to restart the kernel to use updated packages.


In [3]:
import tensorflow as tf
print(tf.__version__)

ImportError: Traceback (most recent call last):
  File "C:\Users\KARIS\anaconda3\Lib\site-packages\tensorflow\python\pywrap_tensorflow.py", line 73, in <module>
    from tensorflow.python._pywrap_tensorflow_internal import *
ImportError: DLL load failed while importing _pywrap_tensorflow_internal: A dynamic link library (DLL) initialization routine failed.


Failed to load the native TensorFlow runtime.
See https://www.tensorflow.org/install/errors for some common causes and solutions.
If you need help, create an issue at https://github.com/tensorflow/tensorflow/issues and include the entire stack trace above this error message.

In [None]:
# task2_tensorflow_mnist.py

import tensorflow as tf
from tensorflow.keras import layers, models
import matplotlib.pyplot as plt

# Load MNIST dataset
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()

# Preprocess data: normalize and expand dimensions
x_train = x_train[..., None] / 255.0
x_test = x_test[..., None] / 255.0

# Build CNN model
model = models.Sequential([
    layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)),
    layers.MaxPooling2D((2, 2)),
    layers.Conv2D(64, (3, 3), activation='relu'),
    layers.Flatten(),
    layers.Dense(64, activation='relu'),
    layers.Dense(10, activation='softmax')
])

# Compile model
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

# Train model
model.fit(x_train, y_train, epochs=5, validation_split=0.1)

# Evaluate model
test_loss, test_acc = model.evaluate(x_test, y_test)
print("\nTest accuracy:", test_acc)

# Predict on first 5 test samples
predictions = model.predict(x_test[:5])

# Visualize predictions
for i in range(5):
    plt.imshow(x_test[i].squeeze(), cmap='gray')
    pred_label = predictions[i].argmax()  # fix: apply argmax to each sample
    plt.title(f"Predicted: {pred_label}, True: {y_test[i]}")
    plt.axis('off')
    plt.show()


In [None]:
# task3_spacy_nlp.py

import spacy

# Load English NLP pipeline
nlp = spacy.load("en_core_web_sm")

# Sample Amazon reviews
reviews = [
    "I love my new Samsung Galaxy! The screen is amazing.",
    "The Sony headphones were terrible. Never buying again.",
    "Apple’s MacBook Pro is sleek and fast."
]

# Rule-based sentiment
positive_words = ['love', 'amazing', 'great', 'excellent', 'fast', 'sleek']
negative_words = ['terrible', 'bad', 'worst', 'slow', 'never']

for review in reviews:
    doc = nlp(review)
    print(f"\nReview: {review}")
    print("Entities:")
    for ent in doc.ents:
        print(f" - {ent.text} ({ent.label_})")

    # Simple sentiment analysis
    sentiment = "positive" if any(word in review.lower() for word in positive_words) else \
                "negative" if any(word in review.lower() for word in negative_words) else "neutral"
    print("Sentiment:", sentiment)


1. Ethical Considerations (Bias + Mitigation)
 Potential Biases
A) In the MNIST Model
Bias type: Dataset bias — MNIST contains only grayscale digits written by mostly American high school students.

Risk: A model trained solely on MNIST may fail on other scripts, e.g., non-Latin digits or culturally diverse handwriting.

Example Bias: It may underperform for elderly handwriting or digits from left-handed writers.

B) In the Amazon Reviews NLP Task
Bias type: Sentiment analysis might:

Misclassify sarcasm (e.g., "Great, it broke in 2 hours")

Assume negativity for non-standard English or grammar

Show skewed results if most training reviews are from a single demographic

 Mitigation Strategies
A) TensorFlow Fairness Indicators (for MNIST)
Allows you to measure performance metrics across slices of data (e.g., accuracy by gender or age group — if such metadata exists).

Helps ensure the model isn’t overfitting to a dominant subgroup.
B) spaCy Rule-Based Adjustments (for Reviews)
You can:

Add rules to detect sarcastic expressions

Normalize informal language (e.g., “luv” → “love”)

Customize entity types to reduce false positives on brand names


In [None]:
import spacy
from spacy.tokens import Doc

def detect_sarcasm(doc):
    if "yeah right" in doc.text.lower():
        doc._.sentiment = "sarcasm"
    return doc

# Register and apply custom rule
nlp = spacy.load("en_core_web_sm")
Doc.set_extension("sentiment", default=None, force=True)
nlp.add_pipe(detect_sarcasm, last=True)

doc = nlp("Yeah right, best product ever.")
print(doc._.sentiment)  # sarcasm


In [None]:
# Troubleshooting Challenge: Buggy TensorFlow Code
#  Buggy Code Example
# Here’s a common student mistake:
model = tf.keras.Sequential([
    tf.keras.layers.Dense(128, activation='relu', input_shape=(28, 28)),
    tf.keras.layers.Dense(10)
])

model.compile(optimizer='adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

model.fit(x_train, y_train, epochs=5)

# Issues:
# input_shape=(28, 28) is 2D — but Dense layers need 1D input.
# loss='categorical_crossentropy' requires one-hot labels, but labels are integers.
# Missing Flatten() layer before Dense layers

# Fixed Version
 model = tf.keras.Sequential([
    tf.keras.layers.Flatten(input_shape=(28, 28)),  # Fix: Flatten the image
    tf.keras.layers.Dense(128, activation='relu'),
    tf.keras.layers.Dense(10, activation='softmax')  # Add softmax for classification
])

model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',  # Fix: use sparse for integer labels
              metrics=['accuracy'])

model.fit(x_train, y_train, epochs=5)
