# CSCI632 Homework 3: Probability

### Instructions
This assignment is designed to cover key probability concepts used throughout *Deep Learning* (Goodfellow et al., Ch. 3).
Show all steps and justify each answer.   You do not need to use LaTeX for these answers.  


## Deliverables
- A jupyter notebook, scanned image, or PDF of your typed, LaTeX, or handwritten solutions.
- For conceptual questions, write concise but clear explanations.
- For computational problems, show all steps.


### Part A. Venn diagram problems — draw and reason

For each problem below, draw a Venn diagram (label all regions) and answer the questions. 
Show any probability calculations and briefly justify your conclusion about 
independence / disjointness / subset relationships.

1. Two events with partial overlap  
    - Given: $P(A) = 0.4, P(B) = 0.5, P(A \cap B) = 0.2.$  

    (a) Draw the Venn diagram and label the three regions ($A$, $B$, $A \cap B$). 

    (b) Compute P(A|B) and P(B|A). 

    (c) Are A and B independent? Are they disjoint?



2. Two events that are disjoint  
    - Given: $P(A) = 0.3, P(B) = 0.6, P(A \cap B) = 0.$  

    (a) Draw a Venn diagram consistent with these probabilities and label all regions.

    (b) Are A and B independent? Explain why disjointness implies or does not imply independence.



3. Two events with overlap
    - Given: Sets $A$ and $B$ ovelap, $P(A) = 0.5, P(B) = 0.4, P(A \cap B) = 0.3$

    (a) Draw a Venn diagram consistent with these probability and label all regions.

    (b) Are A and B independent?  Why? 



4. Subset and overlapping events (three sets)  
    - Given three events $A, B, C$ with: $A \subset C$, $B$ overlaps $C$ 
      but is not a subset, and $A$ and $B$ are disjoint. 
      
      $P(A) = 0.2, P(B) = 0.3, P(C) = 0.6, P(B \cap C) = 0.1, A \cap B = 0$.  

    (a) Draw a three-set Venn diagram consistent with these relations and label all regions. 
    
    (b) Compute $P(A \cup B \cup C)$.
    
    (c) Is any pair independent? Which pairs? Why?




5. Conditional independence scenario  
    - Setup: Describe events $A$ and $B$ that are not independent marginally 
      but are independent conditional on event $C.$  

    (a) Sketch Venn diagrams for the joint distribution marginally
    and the partition induced by $C$.
    
    (b) Provide a small numeric example (choose probabilities) that
    satisfies: 
    
    $P(A \cap B) \ne P(A)P(B)$ but $P(A \cap B | C) = P(A|C)P(B|C)$.
    
    (c) Explain why conditioning can create or remove independence.



6. From a standard 52-card deck, 
  let $A$ = "card is red", $B$ = "card is a face card (J,Q,K)", 
  $C$ = "card is a heart". 
      
    (a) Draw Venn diagrams showing relationships between A, B, C (treat sets as colors and ranks). Label regions with counts or probabilities. 
    
    (b) State which events are disjoint, which are subsets, and whether any pairs are independent.

### Part B. Discrete Random Variables

7. Basic conditional probability and independence  
    - Given events A and B with P(A) = 0.45, P(B) = 0.30, and P(A ∩ B) = 0.135: 

      (a) Compute P(A|B) and P(B|A).  

      (b) Are A and B independent? Justify your answer.

8. Bayes' rule — medical test example  
    - A disease has prevalence 1%. A diagnostic test has sensitivity 95% (P(test+ | disease))
      and specificity 90% (P(test- | no disease)).  

      (a) Compute P(disease | test+).  

      (b) Compute P(no disease | test-).  

      (c) Explain how prevalence affects the posterior probability.

9. Chain rule and conditional probability with three events  
    - Let $A, B, C$ be events with $P(A) = 0.6, P(B | A) = 0.5, P(B | A^c) = 0.2, 
      P(C | A \cap B) = 0.9, P(C | A \cap B^c) = 0.4, P(C | A^c \cap B) = 0.7, P(C | A^c \cap B^c) = 0.1.$  

      (a) Use the chain rule and total probability to compute $P(A \cap B \cap C)$.  

      (b) Compute $P(C)$ and $P(A | C)$.


10. Draw a 5-card hand uniformly at random from a standard $52$-card deck (without replacement).

    (a) Compute the probability the hand contains exactly two pairs (two distinct
    ranks each appearing twice, plus a fifth card of a different rank). Give the
    answer in terms of binomial coefficients (e.g., $n$ choose $k$).

    (b) Compute the probability the hand contains at least one ace. Express
    using complementary counting and combinatorial terms.

    (c) Given the hand contains at least one ace, compute the conditional
    probability that it contains exactly two aces.

11. Probability Mass Functions (PMF)

    - Let $X$ be a discrete random variable with the following probability mass function (PMF):  

    $$
    P(X = x) =
    \begin{cases} 
    0.1 & \text{if } x = 1, \\
    0.2 & \text{if } x = 2, \\
    0.3 & \text{if } x = 3, \\
    0.4 & \text{if } x = 4, \\
    0 & \text{otherwise.}
    \end{cases}
    $$

    (a) Verify that this is a valid PMF.  

    (b) Compute $P(X \leq 3)$.  

    (c) Compute the expected value $\mathbb{E}[X]$.  

    (d) Compute the variance $\text{Var}(X)$.  

### Part C. Continuous random variables — expectation and variance

For each problem below, show all integration steps and justify any changes of variables. Compute the expectation $\mathbb{E}[X]$ and the variance $\mathrm{Var}(X)$.

12. Let $X\sim\mathrm{Uniform}(a,b)$ with $-\infty<a<b<\infty$.  
    (a) Write the pdf $f_X(x)$ and verify it integrates to $1$.  
    (b) Compute $\mathbb{E}[X]$ and $\mathrm{Var}(X)$ in terms of $a,b$.

13. Let $X$ have pdf

    $$
    f_X(x)=
    \begin{cases}
    kx^2, & 0\le x\le 1,\\[4pt]
    0, & \text{otherwise.}
    \end{cases}
    $$
    (a) Find the normalizing constant $k$.  
    (b) Compute $\mathbb{E}[X]$ and $\mathrm{Var}(X)$.

14. Let $X\sim\mathrm{Exponential}(\lambda)$ with $\lambda>0$, i.e. $f_X(x)=\lambda e^{-\lambda x}$ for $x\ge0$.  
    Compute $\mathbb{E}[X]$ and $\mathrm{Var}(X)$ (show the integrals).



### Part D. Functions often used in ML

15. Sigmoid Function

  - The sigmoid function is often used in machine learning and statistics to model probabilities. It is defined as:

    $$
    \sigma(x) = \frac{1}{1 + e^{-x}}
    $$

    (a) Compute the value of $\sigma(0)$ and interpret its meaning in terms of probability.

    (b) Show that $\sigma(-x) = 1 - \sigma(x)$.
    
    (c) Plot the sigmoid function for $x \in [-10, 10]$ and describe its key properties (e.g., range, asymptotes, etc.).


16. Hyperbolic Tangent Function

    - The hyperbolic tangent function, $\tanh(x)$, is another function commonly used in
      machine learning. It is defined as:

        $$
        \tanh(x) = \frac{e^x - e^{-x}}{e^x + e^{-x}}
        $$

        (a) Compute the value of $\tanh(0)$.

        (b) Show that $\tanh(-x) = -\tanh(x)$.

        (c) Plot the $\tanh(x)$ function for $x \in [-10, 10]$ and describe its key properties (e.g., range, asymptotes, symmetry, etc.).

        (d) Compare $\tanh(x)$ with the sigmoid function $\sigma(x)$ in terms of output range.