# Chapter 1 - Core AI Concepts and Tools
This notebook sets up your Python environment and explores core AI concepts such as machine learning, deep learning, and generative models. It demonstrates how to integrate AI into applications while maintaining ethical standards using popular frameworks like Scikit-learn, Hugging Face, and Stable Diffusion.

### Listing 1-1: Installing the Python Starter Set
Let's kick things off by installing your 'survival kit' of Python libraries using **pip**, so you're fully equipped with the tools you'll need for development.

In [None]:
# Install fundamental Python libraries for data manipulation, visualization, and AI models
%pip install numpy pandas matplotlib scikit-learn


### Block 2: Importing Core Libraries
In this block, we import core libraries like **NumPy, Pandas, and Matplotlib**, verifying that our setup is configured correctly for the notebook.

In [None]:
# Import necessary libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import sklearn

# Confirm that libraries are set up correctly
print('Libraries installed, imported, and ready to go!')

### Block 3: Cleansing a Dataset
We’ll cleanse a dataset of superhero attributes, handling missing values and dropping incomplete rows, ensuring it’s ready for analysis or modeling using **Pandas' Dataframe**.

In [None]:
# Sample dataset of fictional superheroes
data = {'Name': ['Captain Valor', 'Mighty Max', 'Lady Storm', None],
        'Age': [35, 29, 28, None],
        'Power_Level': [95, 89, 93, 88]}

# Load the dataset into a Pandas DataFrame
df = pd.DataFrame(data)

# Display the original DataFrame
print('Original DataFrame:', df, sep='\n')

# Fill missing values in the 'Age' column with the mean age
df = df.assign(Age=df['Age'].fillna(df['Age'].mean()))

# Remove rows where the 'Name' field is missing
df = df.dropna(subset=['Name'])

# Display the cleaned DataFrame
print('\nCleaned DataFrame:', df, sep='\n')


### Block 4: Comparing Data with Element-Wise Operations
We'll use **NumPy** for element-wise operations, comparing superhero abilities and challenges to determine suitability based on strength and speed features.

In [None]:
# Superhero power matrix (rows: heroes, columns: strength, speed)
# Hero A (Speedster), Hero B (Strongman)
heroes = np.array([[90, 75],  # High speed, moderate strength
                   [60, 95]]) # Low speed, high strength

# Object challenge matrix (rows: strength, speed requirements)
# Object 1: Train, Object 2: Speeding Bullet
objects = np.array([[80, 50],  # High strength, moderate speed required
                    [40, 100]])# Low strength, very high speed required

# Element-wise division to compare powers to challenges
result = heroes / objects

# Print the power matrices and the result
print("Hero Power Levels:\n", heroes)
print("\nObject Challenges:\n", objects)
print("\nHero-to-Object Power Ratio (Division):\n", result)

### **Block 5: Calculating dot product and Cosine Similarity**
We use the **NumPy** dot product to calculate the similarity between two superheroes' attributes (strength, speed, intelligence). We then compute the cosine similarity to interpret how closely aligned their power profiles are.


In [None]:
# Create two superhero power vectors (strength, speed, intelligence)
hero_1 = np.array([85, 90, 75])  # Hero A: strong, fast, smart
hero_2 = np.array([80, 85, 70])  # Hero B: slightly less in all areas

# Calculate the dot product to compare their power profiles
dot_product = np.dot(hero_1, hero_2)

# Calculate the magnitude (norm) of each vector
magnitude_hero_1 = np.linalg.norm(hero_1)
magnitude_hero_2 = np.linalg.norm(hero_2)

# Calculate the cosine similarity
cosine_similarity = dot_product / (magnitude_hero_1 * magnitude_hero_2)

# Print the power profiles, dot product, and cosine similarity
print("Hero A Power Profile:", hero_1)
print("Hero B Power Profile:", hero_2)
print("\nDot Product (Similarity):", dot_product)
print("\nCosine Similarity (Range: -1 to 1):", cosine_similarity)

# Interpret the cosine similarity
if cosine_similarity > 0.9:
    similarity_text = "The heroes have very similar power profiles."
elif cosine_similarity > 0.7:
    similarity_text = "The heroes have somewhat similar power profiles."
else:
    similarity_text = "The heroes have quite different power profiles."

print("\nInterpretation:", similarity_text)

### Block 6: Optimizing Attributes with Gradients
We use **NumPy** to calculate the gradient between current and target attributes, showing the amount and direction of improvement needed for superheroes.

In [None]:
# Current and target attribute levels (e.g., strength, speed)
current_values = np.array([60, 80])
target_values = np.array([90, 100])

# Calculate the gradient showing required improvement to meet targets
gradient = target_values - current_values

# Display current levels, targets, and gradient values
print('Current Attributes (strength, speed):', current_values)
print('Target Attributes (strength, speed):', target_values)
print('\nGradient (Improvement Needed):', gradient)

# Explanation of output
if np.all(gradient >= 0):
    explanation = "The gradient values indicate how much the superhero needs to improve in each attribute to reach the target levels."
else:
    explanation = "Some attributes may exceed targets, so not all need adjustment."

print("\nExplanation:", explanation)

### Block 7: Estimating Success Probability
We use **NumPy** to compare superhero attributes against thresholds, calculating the probability of success through element-wise operations.


In [None]:
# Define the actual values (e.g., strength and speed)
actual_values = np.array([80, 90])

# Define the threshold required for success
required_thresholds = np.array([70, 85])

# Calculate success probability (actual values divided by required thresholds)
probability_of_success = actual_values / required_thresholds

# Print the actual values, thresholds, and calculated probabilities
print('Actual Values (strength, speed):', actual_values)
print('Required Thresholds (strength, speed required):', required_thresholds)
print('\nProbability of Success:', probability_of_success)

# Explanation of output
if np.all(probability_of_success >= 1):
    explanation = "Values exceed thresholds, showing high success probability."
else:
    explanation = "Values are below the thresholds, indicating a lower probability of success."

print("\nExplanation:", explanation)

### Block 8: Predicting Revenue Using Linear Regression
This block uses machine learning **Scikit-learn** to predict superhero movie box office revenue based on production budgets, demonstrating **linear regression** for simple financial forecasting. We’ll also predict the box office revenue if the budget is $400 million.


In [None]:
# Import Linear Regression package from Scikit-learn
from sklearn.linear_model import LinearRegression

# Example data: production budgets (input) and box office revenue (output)
budgets = np.array([100, 150, 200, 250, 300]).reshape(-1, 1)  # in millions
box_office = np.array([300, 400, 600, 750, 900])  # in millions

# Create and train the linear regression model
model = LinearRegression()
model.fit(budgets, box_office)

# Predict box office revenue based on production budgets
predicted_revenue = model.predict(budgets)

# Predict box office revenue for a $400 million budget
new_budget = np.array([[400]])  # in millions
predicted_new_revenue = model.predict(new_budget)

# Print the slope (m), intercept (b), and the prediction for $400 million
print(f"Slope (m): {model.coef_[0]}")
print(f"Intercept (b): {model.intercept_}")
print(f"Predicted Box Office Revenue for $400 million budget: {predicted_new_revenue[0]:.2f} million")

# Plot the data points and regression line
plt.scatter(budgets, box_office, color='blue', label='Actual Revenue')
plt.plot(budgets, predicted_revenue, color='red', label='Predicted Revenue')
plt.scatter(new_budget, predicted_new_revenue, color='green', marker='x', s=100, label='Prediction for $400M')
plt.xlabel('Production Budget (in millions)')
plt.ylabel('Box Office Revenue (in millions)')
plt.title('Linear Regression: Box Office Revenue vs Production Budget')
plt.legend()
plt.show()

### Block 9: Classifying Emails Using Naive Bayes
Trains a model using Scikit-learn's **Naive Bayes** classifier to determine if superhero-themed emails are **spam or not spam**, demonstrating **text classification** and natural language processing.

In [None]:
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB

# Sample dataset: 5 spam and 5 non-spam emails
emails = [
    'Unlock Unlimited Superpowers Today!',
    'Claim Your Free Kryptonite—Limited Offer!',
    'Win a Year’s Supply of Superhero Gear!',
    'Congratulations! You’ve Been Selected as the Next Avenger!',
    'Free Batmobile Test Drive!',
    'Quarterly sales report is due next Monday',
    'Please review the attached budget proposal for next quarter',
    'Reminder: Team meeting on Thursday at 9 AM',
    'Your invoice for services rendered is attached',
    'Client feedback report from last week’s presentation'
]

# Labels for spam (1) and non-spam (0)
labels = [1, 1, 1, 1, 1, 0, 0, 0, 0, 0]

# Convert emails to a bag-of-words representation using bi-grams
vectorizer = CountVectorizer(ngram_range=(1, 2), stop_words='english')
X = vectorizer.fit_transform(emails)

# Create and train the Naive Bayes classifier
model = MultinomialNB()
model.fit(X, labels)

# Test the model with new emails
new_emails = [
    'Congratulations, You’ve Been Chosen as the Next Avenger!',
    'Final review of training schedule',
    'Free ticket to the Superhero Conference—Register now!',
    'Please review the attached project plan for the upcoming quarter'
]
X_new = vectorizer.transform(new_emails)
predictions = model.predict(X_new)

# Display predictions for each email
for email, prediction in zip(new_emails, predictions):
    print(f"Email: '{email}' is {'spam' if prediction == 1 else 'not spam'}")

### Block 10: Image Classification with a Pre-Trained Model
We'll use a pre-trained model from **Hugging Face** to classify an image, demonstrating **deep learning** for image classification, in this case a superhero comic book image.

In [None]:
# Import necessary libraries
from transformers import pipeline
from PIL import Image
import requests

# Load an image for classification from a URL
# The image is of a superhero comic book.
url = 'https://opensourceai-book.github.io/code/images/C01-Opensource-ComicBook.jpg'
image = Image.open(requests.get(url, stream=True).raw)

# Load a pre-trained image classification model from Hugging Face
classifier = pipeline('image-classification')

# Classify the image and retrieve predictions
predictions = classifier(image)

# Print the top prediction with its confidence score
print(f"Top Prediction: {predictions[0]['label']} with a score of {predictions[0]['score']:.2f}")

### Block 11: Installing Libraries for Image Generation
To generate images (GenAI) using **Stable Diffusion**, we first install necessary libraries, including diffusers, transformers, and torch (**pytorch**), which enable model operation and image processing.

In [None]:
# Install required libraries for Stable Diffusion
%pip install diffusers transformers torch

### Block 12: Generating Images Using Stable Diffusion
We will generate a comic-style image using a pre-trained Stable Diffusion model. This exercise involves using **GPU** acceleration for faster image generation.

In [None]:
# Import necessary libraries for image generation
from diffusers import StableDiffusionPipeline
import torch
from PIL import Image
from IPython.display import display

# Check if GPU is available and set precision accordingly
device = 'cuda' if torch.cuda.is_available() else 'cpu'
torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32

# Load the Stable Diffusion model with the appropriate settings
pipe = StableDiffusionPipeline.from_pretrained('runwayml/stable-diffusion-v1-5', torch_dtype=torch_dtype).to(device)

# Define a text prompt for generating an image
prompt = 'a family-friendly comic-style supervillain using colors; primary red, bright blue, sunny yellow, black and white'

# Generate the image using the pipeline with reduced inference steps for speed
image = pipe(prompt, num_inference_steps=20).images[0]

# Display the generated image
display(image)

### Block 13: Setting Up Hugging Face Access Token
We configure our environment with a **Hugging Face** access token. This is necessary for programmatic access to models and datasets available on **Hugging Face Hub**.

In [None]:
# Import os library for environment variable manipulation
import os

# Set your Hugging Face Hub Access token (replace 'your_token_here' with your actual token)
os.environ['HUGGINGFACEHUB_ACCESS_TOKEN'] = 'your_token_here'

### Block 14: Installing Required Packages for Hugging Face and LangChain
In this block, we install the necessary packages for using **Hugging Face Hub and LangChain**, which include community modules and tools for building language model applications.

In [None]:
# Install required packages for Hugging Face and LangChain usage
%pip install huggingface_hub langchain langchain-community

### Block 15: Building a Simple Q&A Chatbot Using LangChain
We will set up a basic **Q&A chatbot** using **LangChain** and a small language model from Hugging Face. This demonstrates chaining models and using templates.

In [None]:
# Import necessary libraries for building a LangChain chatbot
from langchain_core.prompts import ChatPromptTemplate
from langchain import HuggingFaceHub

# Initialize a small language model from Hugging Face
llm = HuggingFaceHub(repo_id='mistralai/Mistral-Small-Instruct-2409', model_kwargs={'temperature': 0.1})

# Define a prompt template for the chatbot
prompt = ChatPromptTemplate.from_messages([
    ('system', 'Please respond in {language} in 20 words or less.'),
    ('human', '{input}')
])

# Define the chatbot chain where the prompt is sent to the language model
chain = prompt | llm

# Invoke the chatbot with a sample input
response = chain.invoke({'input': 'Who is the tallest superhero?', 'language': 'English'})

# Print the chatbot's response
print(response)

### Block 16: Dynamic Agent Task Handling with LangChain
Create a LangChain agent that dynamically interacts with tools for tasks like web searching and performing calculations. This showcases the flexibility of **LangChain’s agent framework**.

In [None]:
# Import necessary libraries for LangChain agent setup
from langchain.agents import AgentType, initialize_agent, load_tools
from langchain_openai import OpenAI
import os

# Set OpenAI and SERPAPI API keys securely (replace with your actual keys)
os.environ['OPENAI_API_KEY'] = 'your_openai_api_key'
os.environ['SERPAPI_API_KEY'] = 'your_serpapi_api_key'

# Initialize the OpenAI agent with a temperature setting for control over output randomness
llm = OpenAI(temperature=0.1)

# Load necessary tools for the agent, including SERPAPI for searches and llm-math for calculations
tools = load_tools(['serpapi', 'llm-math'], llm=llm)

# Initialize the LangChain agent with the tools loaded and set it for dynamic response handling
agent = initialize_agent(
    tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True
)

# Query the agent to retrieve and calculate data
query = 'Add average rating of Spider-Man and Iron-Man movies'
agent.invoke(query)

### Block 17: Installing Fairlearn for Model Bias Evaluation
To evaluate and mitigate bias in machine learning models, we install the **Fairlearn** library, which offers tools for assessing fairness in predictions.

In [None]:
# Install Fairlearn library for evaluating model fairness
%pip install fairlearn

### Block 18: Detecting and Mitigating Bias Using Fairlearn
This example uses a simulated dataset and logistic regression to evaluate model bias using **Fairlearn**, highlighting techniques for ensuring ethical AI practices.

In [None]:
# Import necessary libraries
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from fairlearn.metrics import MetricFrame, selection_rate

# Set a random seed for reproducibility
np.random.seed(42)

# Generate simulated data (100 samples with attributes like age, income, and gender)
n_samples = 100
ages = np.random.randint(18, 66, size=n_samples)
incomes = np.random.randint(20000, 120001, size=n_samples)
genders = np.random.randint(0, 2, size=n_samples)  # 0 for Male, 1 for Female

# Simulate loan approval status based on income and age
loan_approval = (incomes > 50000) & (ages < 50)
loan_approval = loan_approval.astype(int)  # Convert boolean to integer

# Create a DataFrame for the dataset
data = pd.DataFrame({
    'age': ages,
    'income': incomes,
    'gender': genders,
    'loan_approval': loan_approval
})

# Split the data into features and target variable
X = data[['age', 'income', 'gender']]
y = data['loan_approval']

# Train-test split with 20% data for testing
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train a logistic regression model
model = LogisticRegression()
model.fit(X_train, y_train)

# Evaluate the model's fairness using Fairlearn
predictions = model.predict(X_test)
metric_frame = MetricFrame(
    metrics=selection_rate,
    y_true=y_test,
    y_pred=predictions,
    sensitive_features=X_test['gender']
)

# Display bias evaluation results
print('\nSelection rates across gender groups:')
print(metric_frame.by_group)