# Python to Reinforcement Learning for beginner

### Lesson Plan:
1. Python Basics
2. Control Flow & Functions
3. Creating and using import 
4. NumPy & Basic Math
5. Introduction to Reinforcement Learning
6. Simple RL Example

## 1. Python Basics (30 minutes)

In [None]:
# Variables and Data Types


In [None]:
# Lists (arrays)

# print the lists

# Adding to list


In [None]:
# Dictionaries (key-value pairs)


## 2. Control Flow & Functions (20 minutes)

In [None]:
# If-else statements


In [None]:
# For loops


In [None]:
# Functions


## 3. Creating and using import

In [None]:
# import a library random

# random.random() returns a floating-point number ≥ 0.0 and < 1.0.
# Useful for simulating probabilities (e.g., chance of success or failure).


### Create a guessing game

In [None]:

# Generate a random number between 1 and 100

# while True
#   Let user input a number
#   attempt = attempt + 1
#   if user input < random generated number, print "too low"
#   else if user input > random generated number, print "too high"
#   else print "Correct" and show the number of attempts

## 4. NumPy & Basic Math (20 minutes)

In [None]:
#use pip install library numpy


# Creating arrays


## 🔢 Understanding the Dot Product with NumPy

The **dot product** (also known as the **scalar product**) is a way to multiply two vectors and get a **single scalar value**.

### 📘 Formula

For two vectors **A** and **B**:

dot(A, B) = A₁ * B₁ + A₂ * B₂ + A₃ * B₃ + ...

## 🔄 Matrix Multiplication in NumPy

Matrix multiplication combines two 2D arrays using the dot product of rows and columns.<p>
Each element in the resulting matrix is calculated as the dot product of a row from matrix A and a column from matrix B.

### 👇 Example 1:

Let: <p>
A = [[1, 2], <p>
     &emsp;&emsp;[3, 4]] <p>
<p>
B = [[5, 6], <p>
     &emsp;&emsp;[7, 8]]
<p>
Then:

A × B = [[1×5 + 2×7, 1×6 + 2×8], [3×5 + 4×7, 3×6 + 4×8]] <p>
A x B = [[19, 22], [43, 50]]

### 👇 Example 2: (2x3)@(3x2) = (2x2)

A = [[1, 2, 3], <p> 
     &emsp;&emsp;[4, 5, 6]]
<p>
B = [[7, 8], <p>
     &emsp;&emsp;[9, 10], <p>
     &emsp;&emsp;[11, 12]]
<p>
Then:

A x B = [[1x7+2x9+3x11, 1x8+2x10+3x12], [4x7+5x9+6x11, 4x8+5x10+6x12]]

A x B = [[1x7+2x9+3x11, 1x8+2x10+3x12], 
         [4x7+5x9+6x11, 4x8+5x10+6x12]]

A x B = [[58, 64], 
         [139, 154]]

In [None]:
# Matrix operations - 1



In [None]:
# Matrix operations - 2

In [None]:
# Numpy array slicing


In [None]:
# Random numbers (important for RL)
# generate 3 random numbers between 0 and 1


# Random number between 0-99


## 5. Introduction to Reinforcement Learning (30 minutes)

### What is Reinforcement Learning?

Reinforcement Learning (RL) is a type of machine learning where an **agent** learns to make decisions by performing **actions** in an **environment** to maximize some notion of **reward**.

Key concepts:
- **Agent**: The learner/decision maker
- **Environment**: The world the agent interacts with
- **State**: Current situation of the agent
- **Action**: What the agent can do
- **Reward**: Feedback from the environment
- **Policy**: Strategy the agent uses to choose actions

![RL Diagram](https://www.guru99.com/images/1/082319_0514_Reinforceme3.png)

### Simple Example: Multi-Armed Bandit

Imagine you have 3 slot machines ("one-armed bandits") with different payout probabilities. How do you learn which one is best?

- **States**: Which machine you're at
- **Actions**: Which machine to play
- **Reward**: Money won (1) or lost (0)
- **Policy**: Strategy for choosing machines

## 6. Simple RL Example (20 minutes)

In [None]:
# Implementing a simple bandit problem
import numpy as np

# True probability of winning for each machine
true_probs = [0.3, 0.5, 0.7]  # Machine 0, 1, 2

class Bandit:
    def __init__(self, true_probs):
        self.true_probs = true_probs
        
    def pull(self, arm):
        # Return 1 with probability of the arm, else 0
        return 1 if np.random.random() < self.true_probs[arm] else 0

# Create our bandit environment
bandit = Bandit(true_probs)

In [None]:
# Simple RL agent - Epsilon-Greedy
class RLAgent:
    def __init__(self, n_arms, epsilon=0.1):
        self.n_arms = n_arms
        self.epsilon = epsilon  # Exploration rate
        self.q_values = np.zeros(n_arms)  # Estimated value of each arm
        self.counts = np.zeros(n_arms)    # Number of times each arm was pulled
    
    def choose_action(self):
        if np.random.random() < self.epsilon:
            # Explore: choose random arm
            return np.random.randint(self.n_arms)
        else:
            # Exploit: choose best known arm
            return np.argmax(self.q_values)
    
    def update(self, arm, reward):
        # Update the estimated value of the arm
        self.counts[arm] += 1
        self.q_values[arm] += (reward - self.q_values[arm]) / self.counts[arm]

In [None]:
# Let's train our agent!
agent = RLAgent(n_arms=3, epsilon=0.1)
n_trials = 1000
rewards = []

for _ in range(n_trials):
    arm = agent.choose_action()
    reward = bandit.pull(arm)
    agent.update(arm, reward)
    rewards.append(reward)

print("Estimated Q-values:", agent.q_values)
print("True probabilities:", true_probs)
print("Total reward:", sum(rewards))
print("Optimal arm found:", np.argmax(agent.q_values))

### What Happened?

1. Our agent started with no knowledge (all Q-values = 0)
2. It explored randomly at first (10% of the time)
3. Over time, it learned which machine pays best
4. Eventually it mostly chose the best machine (arm 2 with 70% win rate)

This is the essence of RL: Learning through trial and error!

## Next Steps

If you enjoyed this, you might explore:
- More complex RL environments (like CartPole)
- Deep Reinforcement Learning (combining neural networks with RL)
- Applications: Game playing (AlphaGo), robotics, recommendation systems

### Resources
- [Reinforcement Learning Introduction (YouTube)](https://www.youtube.com/watch?v=JgvyzIkgxF0)
- [OpenAI Gym (RL environments)](https://gym.openai.com/)
- [Python for Beginners](https://www.python.org/about/gettingstarted/)