## Complete Case Analysis (Listwise Deletion) | [Link](https://github.com/AdilShamim8/50-Days-of-Machine-Learning/tree/main/Day%2019%20Complete%20Case%20Analysis)

### Overview

Complete Case Analysis is a straightforward technique to handle missing data. In this method, you exclude any rows (or cases) that contain missing values from your analysis. This approach is based on the assumption that data are missing completely at random (MCAR). When this assumption holds, dropping incomplete cases does not bias your estimates, though it reduces the effective sample size.

### When to Use

- **MCAR Data:** If the missingness is completely random, dropping cases won’t bias your results.
- **Large Datasets:** When you have a large amount of data, losing some cases may not significantly affect the overall analysis.
- **Preliminary Analysis:** It can be useful during exploratory data analysis to quickly assess relationships without the complications of imputation.

### Advantages

- **Simplicity:** Easy to implement and understand.
- **No Imputation Assumptions:** Avoids potential bias or error introduced by imputation techniques.
- **Consistency:** All analyses use only complete data, ensuring that sample size is the same across different analyses.

### Disadvantages

- **Data Loss:** Can result in a significant reduction in sample size if missing data are prevalent.
- **Potential Bias:** If data are not MCAR, removing cases may introduce bias into your analysis.
- **Reduced Statistical Power:** Fewer observations may lead to less precise estimates.

### Python Implementation

Below is a Python example using the `pandas` library to perform Complete Case Analysis. The code demonstrates how to identify missing data, drop incomplete cases, and compare the dataset before and after deletion.

```python
import pandas as pd
import numpy as np

# Create a sample DataFrame
data = {
    'A': [1, 2, np.nan, 4, 5],
    'B': [5, np.nan, np.nan, 8, 10],
    'C': ['a', 'b', 'c', 'd', 'e']
}
df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)

# Check for missing values in each column
print("\nMissing values per column:")
print(df.isnull().sum())

# Complete Case Analysis: Drop rows with any missing values
df_complete = df.dropna()
print("\nDataFrame after Complete Case Analysis (drop rows with missing values):")
print(df_complete)

# Compare shape before and after
print("\nShape of original DataFrame:", df.shape)
print("Shape after Complete Case Analysis:", df_complete.shape)
```

#### Explanation

1. **Data Creation:** A sample DataFrame is created with some missing values (represented by `np.nan`).
2. **Missing Data Check:** The code prints the number of missing values in each column.
3. **Drop Missing Values:** The `dropna()` function is used to remove any rows that contain missing data.
4. **Comparison:** The shapes of the original and cleaned DataFrames are printed to show how many rows were dropped.

---

### Summary

Complete Case Analysis is a useful and simple method when dealing with missing data—especially under the MCAR assumption. However, care should be taken since dropping data can lead to loss of information and potential bias if the missingness is systematic. In practice, it is essential to evaluate the extent and pattern of missing data before deciding on this approach.
