### Day 73 of Programming

## Pandas vs. NumPy: A Comprehensive Comparison
### Introduction
Pandas and NumPy are essential Python libraries for data manipulation and analysis. While they overlap in functionality, their use cases differ significantly.

### Why Compare Pandas and NumPy?
Understanding the strengths and limitations of each library helps you choose the right tool for specific tasks.



![image.png](attachment:image.png)

In [1]:
import numpy as np

# Create a 2D array
array = np.array([[1, 2, 3], [4, 5, 6]])
print(array)


[[1 2 3]
 [4 5 6]]


In [2]:
import pandas as pd

# Create a DataFrame
data = {'Name': ['Alice', 'Bob'], 'Age': [25, 30]}
df = pd.DataFrame(data)
print(df)


    Name  Age
0  Alice   25
1    Bob   30


### 2. Data Structure Comparison
NumPy: Works with homogeneous data (all elements must be of the same type).

Pandas: Handles heterogeneous data (different types within rows/columns).

#### Example: NumPy Enforces Data Type

In [3]:
array = np.array([1, 2, '3'])  # All elements become strings
print(array)


['1' '2' '3']


#### Example: Pandas Allows Mixed Types

In [4]:
data = pd.DataFrame({'Age': [25, 30], 'Name': ['Alice', 'Bob']})
print(data.dtypes)  # Age is int, Name is object


Age      int64
Name    object
dtype: object


### 3. Performance
NumPy: Faster for numerical computations due to efficient storage and vectorization.
    
Pandas: Slightly slower due to added functionality but optimized with NumPy under the hood.
    
Performance Test

In [5]:
import time

# NumPy
start = time.time()
np_array = np.random.rand(1000000)
np_sum = np_array.sum()
end = time.time()
print("NumPy Time:", end - start)

# Pandas
start = time.time()
pd_series = pd.Series(np.random.rand(1000000))
pd_sum = pd_series.sum()
end = time.time()
print("Pandas Time:", end - start)


NumPy Time: 0.01952052116394043
Pandas Time: 0.023206233978271484


![image.png](attachment:image.png)

### 5. Visualization of Differences

Memory Usage

In [7]:
import sys

# NumPy array memory usage
array = np.array([1, 2, 3, 4, 5])
print("NumPy Memory:", sys.getsizeof(array))

# Pandas Series memory usage
series = pd.Series([1, 2, 3, 4, 5])
print("Pandas Memory:", sys.getsizeof(series))


NumPy Memory: 132
Pandas Memory: 204


### Handling Missing Values

In [8]:
# Pandas can handle missing values
data = pd.Series([1, 2, None, 4])
print("Pandas Sum with NaN:", data.sum())

# NumPy does not handle missing values
array = np.array([1, 2, np.nan, 4])
print("NumPy Sum with NaN:", np.nansum(array))  # Requires np.nansum


Pandas Sum with NaN: 7.0
NumPy Sum with NaN: 7.0


![image.png](attachment:image.png)

### Conclusion
Both Pandas and NumPy are vital in the Python data ecosystem. Use NumPy for numerical tasks and Pandas for data manipulation.