# **What is Probability?**

**Probability** is a measure of the likelihood of an event occurring. It quantifies uncertainty and ranges between **0** and **1**:
- **0**: The event will definitely not happen.
- **1**: The event will definitely happen.

---

### **Key Concepts in Probability**

1. **Experiment**: An action or process that produces an outcome.  
   - Example: Tossing a coin.

2. **Sample Space (\($ S $\))**: The set of all possible outcomes.  
   - Example: For a coin toss, \($ S = \{ \text{Heads, Tails} \} $\).

3. **Event**: A specific outcome or set of outcomes from the sample space.  
   - Example: Getting "Heads" when tossing a coin.

4. **Probability of an Event (\($ P(E) $\))**: The likelihood of an event happening, calculated as:  
   \[$
   P(E) = \frac{\text{Number of favorable outcomes}}{\text{Total number of outcomes}}
   \]$

---

### **Simple Example**

#### **Tossing a Coin**
- **Sample Space (\($ S $\))**: \( $\{ \text{Heads, Tails} \}$ \).
- **Favorable Outcome for Heads (\($ E$ \))**: \( \{$ \text{Heads}$ \} \).

**Probability of Getting Heads**:
\[
P(\text{Heads}) = \frac{\text{Number of favorable outcomes}}{\text{Total outcomes}} = \frac{1}{2} = 0.5
\]

#### **Rolling a Dice**
- **Sample Space (\($ S $\))**: \($ \{1, 2, 3, 4, 5, 6\} $\).
- **Favorable Outcome for Getting 4 (\($ E $\))**: \( $\{4\}$ \).

**Probability of Getting a 4**:
\[$
P(4) = \frac{1}{6} \approx 0.1667
\]$

---

### **Types of Probability**

1. **Theoretical Probability**:
   - Based on logical reasoning and equally likely outcomes.
   - Example: Probability of rolling a 3 on a dice = \($ \frac{1}{6}$ \).

2. **Experimental Probability**:
   - Based on actual experiments or observations.
   - Example: Toss a coin 100 times, and if Heads appears 55 times, the probability of Heads = \($$ \frac{55}{100} = 0.55$$ \).

3. **Subjective Probability**:
   - Based on personal judgment or experience.
   - Example: A weatherman predicting a 70% chance of rain tomorrow.



### **Output Example**
```
Coin Toss Probabilities:
{'Heads': 0.502, 'Tails': 0.498}

Dice Roll Probabilities:
{1: 0.166, 2: 0.168, 3: 0.162, 4: 0.169, 5: 0.166, 6: 0.169}
```

---

### **Insights**
1. **Coin Toss**:
   - Theoretical probability: \( 0.5 \) for Heads or Tails.
   - Experimental probability: Close to \( 0.5 \) after 1,000 trials.

2. **Dice Roll**:
   - Theoretical probability: \( $\frac{1}{6} \approx 0.1667 $\) for each outcome.
   - Experimental probability: Each outcome occurs roughly \( $16.67\% $\) of the time after 1,000 trials.

---

### **Key Takeaways**
- **Probability** provides a framework to quantify uncertainty.
- **Theoretical probabilities** are based on logic, while **experimental probabilities** are based on real-world data.
- Repeated trials in experiments align the experimental probabilities with theoretical ones (Law of Large Numbers).

Let me know if you'd like further examples or explanations!

### **What is a Frequency Distribution Table?**

A **frequency distribution table** organizes data into categories or intervals and displays the frequency (count) of occurrences for each category or interval. It is a simple way to summarize data and understand its distribution.

---

### **Key Components of a Frequency Distribution Table**

1. **Data Categories or Intervals**:
   - For categorical data: Unique categories (e.g., colors, days).
   - For numerical data: Defined intervals (e.g., 0-10, 11-20).

2. **Frequency**:
   - The number of occurrences for each category or interval.

---

### **Why Use a Frequency Distribution Table?**
- Simplifies large datasets.
- Makes it easier to identify patterns, trends, or outliers.
- Serves as a foundation for visualizations like histograms or bar charts.

---

### **Simple Example**

#### **Categorical Data Example**
Suppose we have a list of fruits sold:
\[ $\text{[Apple, Orange, Banana, Apple, Orange, Apple]} $\]

**Frequency Distribution Table**:
| Fruit   | Frequency |
|---------|-----------|
| Apple   | 3         |
| Orange  | 2         |
| Banana  | 1         |

#### **Numerical Data Example**
Suppose we have the scores of 10 students:
\[$ \text{[5, 7, 8, 5, 9, 6, 7, 8, 10, 5]} $\]

We divide the scores into intervals:
- 0-5
- 6-10

**Frequency Distribution Table**:
| Score Range | Frequency |
|-------------|-----------|
| 0-5         | 3         |
| 6-10        | 7         |

---

### **Python Code Example**

#### **For Categorical Data**
```python
import pandas as pd

# Categorical data: Fruits sold
data = ['Apple', 'Orange', 'Banana', 'Apple', 'Orange', 'Apple']

# Create frequency distribution
frequency_table = pd.Series(data).value_counts()

print("Frequency Distribution Table for Fruits:")
print(frequency_table)
```



### **Key Takeaways**
1. **Categorical Data**:
   - Unique categories like "Apple" and "Orange" are counted.
2. **Numerical Data**:
   - Intervals or bins group numerical values for better summarization.
3. **Visualization**:
   - Bar charts make it easy to interpret frequency distributions.

Let me know if you'd like more details or additional examples!

### **What is a Random Variable?**

A **random variable** is a numerical representation of the outcomes of a random experiment. It assigns numbers to the possible outcomes, allowing us to analyze and calculate probabilities.

---

### **1. Discrete Random Variable**

A **discrete random variable** is a random variable that takes on a **countable** number of distinct values. These values are often integers.

#### **Key Characteristics**
- The outcomes can be **listed or counted**.
- It’s used for scenarios where you count things (e.g., people, occurrences, items).

#### **Examples**
1. **Number of Heads in 3 Coin Tosses**:
   - Outcomes: $ 0, 1, 2, 3 $ heads.
   - Random Variable ($ X $): Number of heads.
   - $ X $ is discrete because it only takes on specific values: $ 0, 1, 2, $ or $ 3 $.

2. **Number of Passengers in a Bus**:
   - Possible values: $ 0, 1, 2, \dots $.
   - $ X $ is discrete because you can’t have 2.5 passengers.

---

### **2. Continuous Random Variable**

A **continuous random variable** is a random variable that can take on an **infinite number of possible values** within a given range. These values are often real numbers.

#### **Key Characteristics**
- The outcomes are **measured** rather than counted.
- It’s used for scenarios involving measurements (e.g., height, weight, temperature).

#### **Examples**
1. **Height of Students in a Class**:
   - Possible values: $ 150.5 \, \text{cm}, 170.2 \, \text{cm}, 175.8 \, \text{cm}, \dots $.
   - Random Variable ($ X $): Height of a student.
   - $ X $ is continuous because it can take any value within a range, like \( 150.0 \) to $ 200.0 \, \text{cm} $.

2. **Time Taken to Finish a Race**:
   - Possible values: $ 9.87 \, \text{seconds}, 10.25 \, \text{seconds}, 10.5 \, \text{seconds}, \dots $.
   - $ X $ is continuous because it can take any value in the range.

---

### **Key Differences: Discrete vs. Continuous**

| **Feature**                | **Discrete Random Variable**               | **Continuous Random Variable**             |
|----------------------------|--------------------------------------------|--------------------------------------------|
| **Values**                 | Countable (e.g., 0, 1, 2, …)               | Infinite within a range (e.g., 0.1, 1.23)  |
| **Example**                | Number of heads in coin tosses             | Height of students in a class              |
| **Representation**         | Probability mass function (PMF)            | Probability density function (PDF)         |
| **Key Operation**          | Probabilities for specific values          | Probabilities for ranges of values         |

---

### **Illustration Example**

#### **Coin Toss (Discrete)**
- Toss a coin 3 times.
- Possible outcomes: $ 0, 1, 2, 3 $ heads.
- Random Variable $ X $: Number of heads.

#### **Height of Students (Continuous)**
- Measure students’ heights.
- Possible outcomes: Any value in a range like $ 150.0 \, \text{cm} \) to \( 200.0 \, \text{cm} $.
- Random Variable $ Y $: Height.

---


### **Output Explanation**

1. **Left Plot (Discrete Random Variable)**:
   - Probabilities are shown for each possible value of $ X $ (e.g., $ P(X = 0), P(X = 1) $).

2. **Right Plot (Continuous Random Variable)**:
   - A smooth curve represents the probability density of heights.
   - Probability is calculated for a range (e.g., between $ 160 $ and $ 180 $).

---

### **Key Takeaways**
- **Discrete Random Variables** are for counting specific outcomes.
- **Continuous Random Variables** are for measuring ranges of values.
- Both are essential for analyzing real-world randomness in different ways.

Let me know if you'd like to explore further examples! 😊



### **How It Works**

1. **The Experiment**:
   - You flip a coin.
   - Possible outcomes: Heads (H), Tails (T).

2. **Defining the Random Variable**:
   - Let’s call the random variable $ X$).
   - $ X$) is a rule that assigns a **number** to each outcome:
     - If Heads: $ X = 1$)
     - If Tails: $ X = 0$)

3. **Probability of the Random Variable**:
   - Now, we calculate the probability of $ X$):
     - $ P(X = 1) = 0.5$) (Heads happens in half the flips).
     - $ P(X = 0) = 0.5$) (Tails happens in half the flips).

So, in this case:
- The **random variable** is $ X$).
- The **probability** of $ X = 1$) (Heads) is $ 0.5$).

---

### **Why This Matters**

A **random variable** is a way to:
1. Assign numbers to random outcomes.
2. Study the probabilities of those numbers.

---

### **In Simple Words**
- The **random variable** is $ X = 1$) (Heads) and $ X = 0$) (Tails).
- The **probability of $ X = 1$)** is $ 0.5$).



# **Random Variable Example: Passenger Class (Pclass) in Titanic Dataset**

Let’s use the **Pclass** column in the Titanic dataset as an example of a random variable.

---

### **What Is the Random Experiment?**
- The random experiment is selecting a passenger at random from the Titanic dataset.
- Possible outcomes are the passenger classes: **1st class, 2nd class, 3rd class**.

---

### **Defining the Random Variable**
We define a random variable \( X \) as the **Pclass** of the selected passenger. The values of \( X \) are:
- \( X = 1 \): The passenger is in **1st class**.
- \( X = 2 \): The passenger is in **2nd class**.
- \( X = 3 \): The passenger is in **3rd class**.

---

### **Probability of the Random Variable**
The probability of each value of \( X \) depends on how many passengers are in each class. For example:
- \( P(X = 1) = \frac{\text{Number of passengers in 1st class}}{\text{Total passengers}} \)
- \( P(X = 2) = \frac{\text{Number of passengers in 2nd class}}{\text{Total passengers}} \)
- \( P(X = 3) = \frac{\text{Number of passengers in 3rd class}}{\text{Total passengers}} \)

---

### **Python Code Example**

Let’s calculate the probabilities for \( X \) using the Titanic dataset.

```python
import seaborn as sns
import pandas as pd

# Load Titanic dataset
titanic = sns.load_dataset('titanic')

# Frequency of each class (value counts for Pclass)
pclass_counts = titanic['pclass'].value_counts()

# Total number of passengers
total_passengers = len(titanic)

# Probability distribution of the random variable Pclass
pclass_probabilities = pclass_counts / total_passengers

# Display the results
print("Frequency of Passenger Classes (Pclass):")
print(pclass_counts)

print("\nProbability Distribution of Pclass:")
print(pclass_probabilities)
```

---

### **Output Example**

#### **Frequency of Passenger Classes**
```
Frequency of Passenger Classes (Pclass):
3    491
1    216
2    184
Name: pclass, dtype: int64
```

#### **Probability Distribution**
```
Probability Distribution of Pclass:
3    0.551066
1    0.242424
2    0.206510
Name: pclass, dtype: float64
```

---

### **Interpretation**

1. The random variable \( X \) represents the **Pclass** of a passenger.
2. The probabilities are:
   - \( P(X = 3) = 0.551 \): About 55% of passengers are in 3rd class.
   - \( P(X = 1) = 0.242 \): About 24% of passengers are in 1st class.
   - \( P(X = 2) = 0.207 \): About 21% of passengers are in 2nd class.

---

### **Key Takeaways**
- **Pclass** is treated as a **random variable** that takes values \( 1, 2, \) or \( 3 \).
- We calculate the probability of each class by dividing its count by the total number of passengers.
- This probability distribution helps us understand the likelihood of a passenger being in each class.

Let me know if you'd like further clarification or a deeper dive into this example! 😊