# AI Tutorial by AI - Complete Introduction

Welcome to this comprehensive AI/Machine Learning tutorial! This notebook covers the fundamentals of AI and ML with hands-on examples.

## What You'll Learn
1. **Python for Data Science** - NumPy, Pandas, and data manipulation
2. **Data Visualization** - Creating compelling charts and plots
3. **Machine Learning** - Classification, regression, and clustering
4. **Neural Networks** - Understanding deep learning basics

## Prerequisites
- Basic Python knowledge
- Curiosity about AI and machine learning!

Let's start by importing the necessary libraries:

In [None]:
# Import essential libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px

# Machine learning libraries
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, classification_report

# Set random seed for reproducibility
np.random.seed(42)

# Configure plotting
plt.style.use('seaborn-v0_8')
sns.set_palette("husl")

print("Libraries imported successfully! 🎉")

## 1. Python for Data Science Basics

Let's start with NumPy and Pandas fundamentals:

In [None]:
# NumPy arrays - the foundation of data science
arr = np.array([1, 2, 3, 4, 5])
matrix = np.array([[1, 2, 3], [4, 5, 6]])

print("NumPy Array:", arr)
print("Matrix Shape:", matrix.shape)
print("Array Mean:", np.mean(arr))
print("Array Standard Deviation:", np.std(arr))

# Mathematical operations
print("\nMathematical Operations:")
print("Squared:", arr ** 2)
print("Greater than 3:", arr > 3)

In [None]:
# Pandas DataFrames - for structured data
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'Diana'],
    'Age': [25, 30, 35, 28],
    'City': ['New York', 'London', 'Tokyo', 'Paris'],
    'Salary': [50000, 60000, 70000, 55000]
}

df = pd.DataFrame(data)
print("Sample DataFrame:")
print(df)

print("\nDataFrame Info:")
print(f"Shape: {df.shape}")
print(f"Average Age: {df['Age'].mean():.1f}")
print(f"Max Salary: ${df['Salary'].max():,}")

# Filtering data
high_earners = df[df['Salary'] > 55000]
print("\nHigh earners (>$55,000):")
print(high_earners[['Name', 'Salary']])

## 2. Data Visualization

Let's load some sample data and create visualizations:

In [None]:
# Load sample datasets
try:
    classification_df = pd.read_csv('sample_data/classification_sample.csv')
    regression_df = pd.read_csv('sample_data/regression_sample.csv')
    print("Sample data loaded successfully!")
    print(f"Classification data shape: {classification_df.shape}")
    print(f"Regression data shape: {regression_df.shape}")
except FileNotFoundError:
    print("Sample data not found. Creating sample data...")
    # Create sample data if not found
    n_samples = 1000
    feature1 = np.random.normal(0, 1, n_samples)
    feature2 = np.random.normal(0, 1, n_samples)
    target = (feature1 + feature2 > 0).astype(int)
    
    classification_df = pd.DataFrame({
        'feature1': feature1,
        'feature2': feature2,
        'target': target
    })
    
    x = np.linspace(0, 10, 200)
    y = 2 * x + 1 + np.random.normal(0, 1, 200)
    
    regression_df = pd.DataFrame({
        'x': x,
        'y': y
    })

# Display first few rows
print("\nClassification Data:")
print(classification_df.head())
print("\nRegression Data:")
print(regression_df.head())

In [None]:
# Create visualization examples
fig, axes = plt.subplots(2, 2, figsize=(12, 10))
fig.suptitle('Data Visualization Examples', fontsize=16)

# Scatter plot of classification data
axes[0, 0].scatter(classification_df['feature1'], classification_df['feature2'], 
                   c=classification_df['target'], cmap='viridis', alpha=0.6)
axes[0, 0].set_title('Classification Data')
axes[0, 0].set_xlabel('Feature 1')
axes[0, 0].set_ylabel('Feature 2')

# Line plot of regression data
axes[0, 1].scatter(regression_df['x'], regression_df['y'], alpha=0.6)
axes[0, 1].set_title('Regression Data')
axes[0, 1].set_xlabel('X')
axes[0, 1].set_ylabel('Y')

# Histogram of feature1
axes[1, 0].hist(classification_df['feature1'], bins=30, alpha=0.7, edgecolor='black')
axes[1, 0].set_title('Distribution of Feature 1')
axes[1, 0].set_xlabel('Feature 1')
axes[1, 0].set_ylabel('Frequency')

# Box plot by target class
classification_df.boxplot(column='feature1', by='target', ax=axes[1, 1])
axes[1, 1].set_title('Feature 1 by Target Class')
axes[1, 1].set_xlabel('Target Class')
axes[1, 1].set_ylabel('Feature 1')

plt.tight_layout()
plt.show()

## 3. Machine Learning Fundamentals

Now let's build and train machine learning models:

In [None]:
# Prepare data for machine learning
X = classification_df[['feature1', 'feature2']]
y = classification_df['target']

# Split into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

print(f"Training set size: {X_train.shape[0]}")
print(f"Testing set size: {X_test.shape[0]}")
print(f"Features: {list(X.columns)}")
print(f"Target classes: {sorted(y.unique())}")

In [None]:
# Train different models
models = {
    'Logistic Regression': LogisticRegression(random_state=42),
    'Random Forest': RandomForestClassifier(n_estimators=100, random_state=42)
}

results = {}

for name, model in models.items():
    print(f"\nTraining {name}...")
    
    # Train the model
    model.fit(X_train, y_train)
    
    # Make predictions
    y_pred = model.predict(X_test)
    
    # Calculate accuracy
    accuracy = accuracy_score(y_test, y_pred)
    results[name] = accuracy
    
    print(f"Accuracy: {accuracy:.3f}")
    print("\nClassification Report:")
    print(classification_report(y_test, y_pred, target_names=['Class 0', 'Class 1']))

# Compare model performance
print("\n" + "="*50)
print("MODEL COMPARISON")
print("="*50)
for name, accuracy in results.items():
    print(f"{name}: {accuracy:.3f}")

best_model = max(results, key=results.get)
print(f"\nBest model: {best_model} with accuracy: {results[best_model]:.3f}")

In [None]:
# Visualize model performance
plt.figure(figsize=(12, 5))

# Model comparison bar plot
plt.subplot(1, 2, 1)
model_names = list(results.keys())
accuracies = list(results.values())
bars = plt.bar(model_names, accuracies, color=['skyblue', 'lightgreen'])
plt.title('Model Performance Comparison')
plt.ylabel('Accuracy')
plt.ylim(0, 1)

# Add accuracy labels on bars
for bar, accuracy in zip(bars, accuracies):
    plt.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 0.01, 
             f'{accuracy:.3f}', ha='center', va='bottom', fontweight='bold')

# Decision boundary visualization
plt.subplot(1, 2, 2)

# Use the best model to create decision boundary
best_model_obj = models[best_model]

# Create a mesh
h = 0.02
x_min, x_max = X['feature1'].min() - 1, X['feature1'].max() + 1
y_min, y_max = X['feature2'].min() - 1, X['feature2'].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
                     np.arange(y_min, y_max, h))

# Make predictions on the mesh
mesh_points = np.c_[xx.ravel(), yy.ravel()]
Z = best_model_obj.predict(mesh_points)
Z = Z.reshape(xx.shape)

# Plot decision boundary
plt.contourf(xx, yy, Z, alpha=0.8, cmap='RdYlBu')
scatter = plt.scatter(X_test['feature1'], X_test['feature2'], c=y_test, 
                     cmap='RdYlBu', edgecolors='black')
plt.colorbar(scatter)
plt.title(f'Decision Boundary\n{best_model}')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')

plt.tight_layout()
plt.show()

## 4. Neural Networks Introduction

Let's explore the basics of neural networks and implement a simple one from scratch:

In [None]:
# Visualize activation functions
x = np.linspace(-5, 5, 100)

def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def tanh(x):
    return np.tanh(x)

def relu(x):
    return np.maximum(0, x)

plt.figure(figsize=(12, 4))

plt.subplot(1, 3, 1)
plt.plot(x, sigmoid(x), 'b-', linewidth=2, label='Sigmoid')
plt.title('Sigmoid Activation Function')
plt.xlabel('Input')
plt.ylabel('Output')
plt.grid(True, alpha=0.3)
plt.legend()

plt.subplot(1, 3, 2)
plt.plot(x, tanh(x), 'r-', linewidth=2, label='Tanh')
plt.title('Tanh Activation Function')
plt.xlabel('Input')
plt.ylabel('Output')
plt.grid(True, alpha=0.3)
plt.legend()

plt.subplot(1, 3, 3)
plt.plot(x, relu(x), 'g-', linewidth=2, label='ReLU')
plt.title('ReLU Activation Function')
plt.xlabel('Input')
plt.ylabel('Output')
plt.grid(True, alpha=0.3)
plt.legend()

plt.tight_layout()
plt.show()

print("Activation functions are crucial for neural networks!")
print("- Sigmoid: Good for binary classification output")
print("- Tanh: Zero-centered, good for hidden layers")
print("- ReLU: Most popular for hidden layers, simple and effective")

In [None]:
# Simple neural network implementation
class SimpleNeuralNetwork:
    def __init__(self, input_size, hidden_size, output_size, learning_rate=0.01):
        # Initialize weights randomly
        self.W1 = np.random.randn(input_size, hidden_size) * 0.1
        self.b1 = np.zeros((1, hidden_size))
        self.W2 = np.random.randn(hidden_size, output_size) * 0.1
        self.b2 = np.zeros((1, output_size))
        self.learning_rate = learning_rate
        self.loss_history = []
    
    def sigmoid(self, x):
        return 1 / (1 + np.exp(-np.clip(x, -250, 250)))
    
    def forward(self, X):
        self.z1 = np.dot(X, self.W1) + self.b1
        self.a1 = self.sigmoid(self.z1)
        self.z2 = np.dot(self.a1, self.W2) + self.b2
        self.a2 = self.sigmoid(self.z2)
        return self.a2
    
    def train_step(self, X, y):
        m = X.shape[0]
        
        # Forward propagation
        output = self.forward(X)
        
        # Calculate loss
        loss = -np.mean(y * np.log(output + 1e-15) + (1 - y) * np.log(1 - output + 1e-15))
        self.loss_history.append(loss)
        
        # Backward propagation
        dZ2 = output - y
        dW2 = (1/m) * np.dot(self.a1.T, dZ2)
        db2 = (1/m) * np.sum(dZ2, axis=0, keepdims=True)
        
        dA1 = np.dot(dZ2, self.W2.T)
        dZ1 = dA1 * self.a1 * (1 - self.a1)
        dW1 = (1/m) * np.dot(X.T, dZ1)
        db1 = (1/m) * np.sum(dZ1, axis=0, keepdims=True)
        
        # Update weights
        self.W2 -= self.learning_rate * dW2
        self.b2 -= self.learning_rate * db2
        self.W1 -= self.learning_rate * dW1
        self.b1 -= self.learning_rate * db1
        
        return loss
    
    def predict(self, X):
        output = self.forward(X)
        return (output > 0.5).astype(int)

print("Simple Neural Network class defined!")
print("This network has:")
print("- Input layer: Takes features")
print("- Hidden layer: Processes information")
print("- Output layer: Makes predictions")
print("- Sigmoid activation: For smooth gradients")

In [None]:
# Train our neural network
from sklearn.preprocessing import StandardScaler

# Scale the features for better training
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Create and train neural network
nn = SimpleNeuralNetwork(input_size=2, hidden_size=5, output_size=1, learning_rate=0.1)

# Training loop
epochs = 1000
y_train_reshaped = y_train.values.reshape(-1, 1)

print("Training neural network...")
for epoch in range(epochs):
    loss = nn.train_step(X_train_scaled, y_train_reshaped)
    
    if epoch % 200 == 0:
        print(f"Epoch {epoch}, Loss: {loss:.4f}")

# Make predictions
train_predictions = nn.predict(X_train_scaled)
test_predictions = nn.predict(X_test_scaled)

# Calculate accuracy
train_accuracy = np.mean(train_predictions.flatten() == y_train)
test_accuracy = np.mean(test_predictions.flatten() == y_test)

print(f"\nTraining completed!")
print(f"Training Accuracy: {train_accuracy:.3f}")
print(f"Testing Accuracy: {test_accuracy:.3f}")

In [None]:
# Visualize neural network training results
plt.figure(figsize=(12, 4))

# Plot training loss
plt.subplot(1, 3, 1)
plt.plot(nn.loss_history)
plt.title('Training Loss Over Time')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.grid(True, alpha=0.3)

# Plot decision boundary
plt.subplot(1, 3, 2)
h = 0.02
x_min, x_max = X_train_scaled[:, 0].min() - 1, X_train_scaled[:, 0].max() + 1
y_min, y_max = X_train_scaled[:, 1].min() - 1, X_train_scaled[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
                     np.arange(y_min, y_max, h))

mesh_points = np.c_[xx.ravel(), yy.ravel()]
Z = nn.forward(mesh_points)
Z = Z.reshape(xx.shape)

plt.contourf(xx, yy, Z, levels=50, alpha=0.8, cmap='RdYlBu')
scatter = plt.scatter(X_train_scaled[:, 0], X_train_scaled[:, 1], c=y_train, 
                     cmap='RdYlBu', edgecolors='black')
plt.colorbar(scatter)
plt.title('Neural Network Decision Boundary')
plt.xlabel('Feature 1 (scaled)')
plt.ylabel('Feature 2 (scaled)')

# Compare all models
plt.subplot(1, 3, 3)
all_results = results.copy()
all_results['Neural Network'] = test_accuracy

model_names = list(all_results.keys())
accuracies = list(all_results.values())
colors = ['skyblue', 'lightgreen', 'lightcoral']

bars = plt.bar(model_names, accuracies, color=colors)
plt.title('All Models Comparison')
plt.ylabel('Accuracy')
plt.ylim(0, 1)
plt.xticks(rotation=45)

for bar, accuracy in zip(bars, accuracies):
    plt.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 0.01, 
             f'{accuracy:.3f}', ha='center', va='bottom', fontweight='bold')

plt.tight_layout()
plt.show()

print("Neural network training visualization complete!")

## Summary and Next Steps

Congratulations! 🎉 You've completed this comprehensive AI tutorial. Here's what you've learned:

### Key Concepts Covered:
1. **Python for Data Science**: NumPy arrays, Pandas DataFrames, data manipulation
2. **Data Visualization**: Creating informative plots with matplotlib and seaborn
3. **Machine Learning**: Classification algorithms, model evaluation, decision boundaries
4. **Neural Networks**: Forward/backward propagation, activation functions, training process

### What You Can Do Next:
- 🔬 **Experiment**: Try different datasets and parameters
- 📚 **Learn More**: Explore deep learning frameworks like TensorFlow or PyTorch
- 🏗️ **Build Projects**: Create your own AI applications
- 🌐 **Join Communities**: Participate in Kaggle competitions or AI forums
- 📖 **Advanced Topics**: Study CNNs for images, RNNs for sequences, or Transformers for NLP

### Resources for Continued Learning:
- **Online Courses**: Coursera, edX, Udacity AI courses
- **Books**: "Hands-On Machine Learning" by Aurélien Géron
- **Datasets**: Kaggle, UCI ML Repository, Google Dataset Search
- **Practice**: Work on real-world projects and build a portfolio

Remember: The best way to learn AI is by doing! Keep experimenting and building. 🚀

In [None]:
# Final celebration!
print("🎉 CONGRATULATIONS! 🎉")
print("You've successfully completed the AI Tutorial by AI!")
print("")
print("📊 Models you've learned:")
for name, accuracy in all_results.items():
    print(f"   • {name}: {accuracy:.1%} accuracy")
print("")
print("🚀 You're now ready to explore the exciting world of AI!")
print("Keep learning, keep building, and keep pushing the boundaries of what's possible!")