# 1. Data Structure Flexibility

- **NumPy**: Primarily uses arrays, which are highly efficient but require all data to be of the same type (e.g., all integers or all floats).
- **Pandas**: Introduces the DataFrame (a table with rows and columns) and Series (a labeled list), which can handle mixed data types (e.g., strings, integers, floats in the same table). This flexibility makes it much better for working with real-world, heterogeneous datasets.


In [18]:
# NumPy Example
import numpy as np

# Creating a 2D array with only integers
data = np.array([[1, 2, 3], [4, 5, 6]])
data


array([[1, 2, 3],
       [4, 5, 6]])

In [19]:
# Pandas Example
import pandas as pd

# Creating a DataFrame with mixed data types
data = pd.DataFrame({
    'Name': ['Alice', 'Bob'],
    'Age': [25, 30],
    'Salary': [50000, 60000]
})
data


Unnamed: 0,Name,Age,Salary
0,Alice,25,50000
1,Bob,30,60000


# 2. Handling Missing Data

- **NumPy**: Doesn’t have built-in support for missing values, which can complicate data analysis. If data is incomplete, handling gaps often requires extra steps and custom code.
- **Pandas**: Has built-in methods (`fillna`, `dropna`) for filling or removing missing data, making it easier to work with incomplete datasets and reducing the need for extra coding.


In [20]:
# NumPy Example
import numpy as np

# Creating an array with a NaN value (requires float type)
data = np.array([1, 2, np.nan, 4])
print(data)

# Handling missing data manually
data = np.nan_to_num(data)  # Replace NaN with 0
print(data)


[ 1.  2. nan  4.]
[1. 2. 0. 4.]


In [21]:
# Pandas Example
import pandas as pd
import numpy as np

# Creating a DataFrame with missing values
data = pd.DataFrame({
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, np.nan, 35]
})
print(data)

# Handling missing data in a simple way
data = data.fillna(0)  # Fill NaNs with 0
data


      Name   Age
0    Alice  25.0
1      Bob   NaN
2  Charlie  35.0


Unnamed: 0,Name,Age
0,Alice,25.0
1,Bob,0.0
2,Charlie,35.0


# 3.Label-Based Indexing

- **NumPy**: Uses position-based indexing (i.e., rows and columns are accessed by their number positions), which can lead to errors and make code harder to understand, especially with large datasets.
- **Pandas**: Allows both position-based and label-based indexing, letting you access data by names (e.g., column names or row labels), which improves readability and accuracy in data selection.


In [22]:
# NumPy Example
import numpy as np

data = np.array([[10, 20, 30], [40, 50, 60]])

# Accessing by position
data[0, 1]  # Output: 20


np.int64(20)

In [23]:
# Pandas Example
import pandas as pd

data = pd.DataFrame({
    'A': [10, 40],
    'B': [20, 50],
    'C': [30, 60]
}, index=['Row1', 'Row2'])

# Accessing by label
data.loc['Row1', 'B']  # Output: 20


np.int64(20)

# 4. Data Manipulation and Transformation

- **NumPy**: Supports mathematical operations on arrays but doesn’t have tools for data manipulation tasks, like merging, grouping, or pivoting tables.
- **Pandas**: Offers easy-to-use, powerful methods for complex data manipulation, such as merging datasets, grouping rows by a specific column, and creating pivot tables. These tools are crucial for cleaning and organizing data in a structured way.


In [24]:
# NumPy Example
import numpy as np

# Creating two arrays
data1 = np.array([1, 2, 3])
data2 = np.array([4, 5, 6])

# Basic operation (element-wise addition)
result = data1 + data2
result


array([5, 7, 9])

In [25]:
# Pandas Example
import pandas as pd

# Creating two DataFrames
data1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
data2 = pd.DataFrame({'A': [5, 6], 'B': [7, 8]})

# Merging DataFrames
merged_data = pd.concat([data1, data2], ignore_index=True)
merged_data


Unnamed: 0,A,B
0,1,3
1,2,4
2,5,7
3,6,8


# 5. Time-Series Analysis

- **NumPy**: Can handle arrays with timestamps but lacks advanced support for analyzing time-based data.
- **Pandas**: Has extensive time-series functionality, including time-based indexing, resampling, and rolling window calculations, making it much easier to analyze trends over time.


In [31]:
# NumPy Example
import numpy as np

# Creating a simple array with timestamp-like values
data = np.array(['2023-01-01', '2023-01-02', '2023-01-03'], dtype='datetime64[D]')
print(data)

# Cannot perform complex time-series operations


['2023-01-01' '2023-01-02' '2023-01-03']


In [29]:
# Pandas Example
import pandas as pd

# Creating a DataFrame with time-series data
date_range = pd.date_range(start='2023-01-01', periods=3)
data = pd.DataFrame({
    'Date': date_range,
    'Value': [10, 20, 30]
})
data

Unnamed: 0,Date,Value
0,2023-01-01,10
1,2023-01-02,20
2,2023-01-03,30


In [30]:
# Resampling data to calculate weekly averages
data.set_index('Date', inplace=True)
weekly_data = data.resample('W').mean()
weekly_data

Unnamed: 0_level_0,Value
Date,Unnamed: 1_level_1
2023-01-01,10.0
2023-01-08,25.0


### 6. Data Alignment and Indexing on Merge

- **NumPy**: Requires manual alignment of data if arrays from different sources need to be combined or reshaped, which can be complex.
- **Pandas**: Automatically aligns data based on row and column labels during merges and transformations, saving time and reducing potential errors.


In [32]:
# NumPy Example
import numpy as np

# Manually aligning data by ensuring shape match
data1 = np.array([1, 2, 3])
data2 = np.array([4, 5, 6])

# Simple concatenation if shapes match
merged_data = np.concatenate((data1, data2))
print(merged_data)


[1 2 3 4 5 6]


In [37]:
# Pandas Example
import pandas as pd

# Creating two DataFrames with different indexes
data1 = pd.DataFrame({'A': [1, 2]}, index=['Row1', 'Row2'])
data2 = pd.DataFrame({'B': [3, 4]}, index=['Row1', 'Row3'])

# Merging based on index alignment
aligned_data = pd.concat([data1, data2], axis=1)
aligned_data


Unnamed: 0,A,B
Row1,1.0,3.0
Row2,2.0,
Row3,,4.0
