### Use generators for large data 

Generators do not store the entire dataset in memory, which saves memory and can be faster for large datasets.

In [1]:
# Inefficient
data = [x for x in range(10**6)]

# Efficient
data = (x for x in range(10**6))  # Saves memory

### Avoid using global variables 

Avoiding global variables helps to avoid side effects and makes functions easier to test and debug.

In [2]:
# Inefficient
count = 0
def increment():
    global count
    count += 1

# Efficient
def increment(count):
    return count + 1

### Use context managers for resource management 

Context managers ensure resources are properly managed (e.g., files are closed) without needing explicit close calls.

In [4]:
# Inefficient
file = open("file.txt", "r")
data = file.read()
file.close()

# Efficient
with open("file.txt", "r") as file:
    data = file.read()   # No need to explicitly close the file here 

### Use 'Sets' for membership testing 

Membership tests in sets are on average O(1), whereas lists are O(n).

In [5]:
# Inefficient
items = [1, 2, 3, 4, 5]
if 3 in items:
    print("Found")

# Efficient
items = {1, 2, 3, 4, 5}
if 3 in items:
    print("Found")

Found
Found


### Use default argument values 

Simplifies code and reduces the need for additional conditional checks, saving memory.

In [6]:
# Inefficient
def greet(name):
    if name is None:
        name = "Guest"
    print(f"Hello, {name}!")

# Efficient
def greet(name="Guest"):
    print(f"Hello, {name}!")

### Optimized data type in pandas 

Optimizing data types (e.g., using int32 instead of int64) reduces memory consumption, especially for large DataFrames.

In [8]:
import pandas as pd

df = pd.read_csv('large_file.csv', dtype={'column1': 'int32', 'column2': 'float32'})

### Use sparse data structures

Sparse data structures are used for datasets with many zeros or default values, saving memory by storing only non-zero elements.

In [9]:
import pandas as pd

# Creating a sparse DataFrame
sparse_df = pd.DataFrame({'A': pd.arrays.SparseArray([1, 0, 0, 1, 0])})

### Use 'del' to free memory

Using del helps to explicitly free up memory by deleting objects that are no longer needed.

In [10]:
import pandas as pd

df = pd.read_csv('large_file.csv')

del df  # Free memory

### Apply operations in place 

In-place operations modify data structures directly, reducing the need for additional memory allocation.

In [None]:
import pandas as pd
df = pd.read_csv('large_file.csv')

# Inefficient 
df = df.dropna()

# Efficient 
df.dropna(inplace=True)

### Use 'gc' to manage garbage collection

Garbage collection can be explicitly triggered to free up memory that is no longer in use, especially after large data manipulations.

In [None]:
import gc

# Force garbage collection
gc.collect()

### Use efficient aggregation methods

Efficient aggregation reduces memory overhead by avoiding intermediate data structures.

In [None]:
import pandas as pd

df = pd.read_csv('large_file.csv')
result = df.groupby('column', as_index=False).agg({'value': 'sum'})