## Memory Management

1. **Data Types**: Using appropriate data types can significantly reduce memory footprint. Let's look at an example:

In [None]:
import pandas as pd
import numpy as np

# Create a DataFrame with integer data
df = pd.DataFrame({'A': np.random.randint(0, 100, 1000000)})

# Check memory usage
print(f"Memory usage (int64): {df.memory_usage(deep=True).sum() / 1024 ** 2:.2f} MB")

# Convert data type to int8 (1 byte instead of 8 bytes)
df['A'] = df['A'].astype('int8')

# Check memory usage again
print(f"Memory usage (int8): {df.memory_usage(deep=True).sum() / 1024 ** 2:.2f} MB")

2. **Categorical Data**: For data with a fixed set of values, use Pandas' categorical data type:

In [None]:

# Create a DataFrame with categorical data
df = pd.DataFrame({'Category': np.random.choice(['A', 'B', 'C'], size=1000000)})

# Check memory usage with object data type
print(f"Memory usage (object): {df.memory_usage(deep=True).sum() / 1024 ** 2:.2f} MB")

# Convert to categorical data type
df['Category'] = df['Category'].astype('category')

# Check memory usage again
print(f"Memory usage (categorical): {df.memory_usage(deep=True).sum() / 1024 ** 2:.2f} MB")

3. **Chunking**: For very large datasets, process data in chunks:


In [None]:
# Read a large CSV file in chunks
reader = pd.read_csv("large_data.csv", chunksize=1000000)

# Process each chunk
for chunk in reader:
    # Do some processing...
    processed_data = chunk.apply(lambda x: x ** 2, axis=1)
    # Save processed data or accumulate results

## Vectorized Operations

Vectorized operations in Pandas refer to operations that are applied element-wise to entire arrays, vectors, or data structures, without the need for explicit looping constructs. 
The key advantages of vectorized operations are:

- **Performance:** Vectorized operations are significantly faster than iterating over elements using Python loops, especially for large datasets.
- **Concise and Readable Code:** Vectorized operations often result in more concise and expressive code, making it easier to read and maintain.
- **Memory Efficiency:** Vectorized operations typically have lower memory overhead compared to creating intermediate data structures during iterative operations.

1. **Element-wise Operations**: Basic arithmetic operations are automatically vectorized:

In [None]:
import pandas as pd
import numpy as np

# Create a DataFrame
df = pd.DataFrame({'A': np.random.rand(1000000), 'B': np.random.rand(1000000)})

# Vectorized arithmetic operations
df['C'] = df['A'] + df['B']
df['D'] = df['A'] * 2

2. **Universal Functions (ufuncs)**: Apply NumPy's ufuncs to Pandas data structures:

In [None]:
# Vectorized logarithm
df['log_A'] = np.log(df['A'])

# Vectorized exponential
df['exp_B'] = np.exp(df['B'])

3. **Conditional Operations**: Use vectorized conditional operations:


In [None]:
# Vectorized conditional selection
df['label'] = np.where(df['A'] > 0.5, 'High', 'Low')

# Vectorized boolean indexing
high_values = df[df['A'] > 0.5]

4. **Aggregations**: Perform vectorized aggregations:

In [None]:
# Vectorized sum
total = df['A'].sum()

# Vectorized mean
mean_value = df['B'].mean()

# Groupby and aggregation
group_stats = df.groupby('label')['A'].agg(['mean', 'std'])

5. **Apply and Lambda**: Use `apply()` and lambda functions for custom vectorized operations:

In [None]:
# Vectorized square root
df['sqrt_A'] = df['A'].apply(np.sqrt)

# Vectorized string operations
df['upper_str'] = df['category'].apply(lambda x: x.upper())

By leveraging memory-efficient data storage and vectorized operations, Pandas enables high-performance data handling, even for large datasets. Always profile your code and use the appropriate techniques for your specific use case.