# Data Manipulation with Pandas and NumPy

Data manipulation is the process of changing, organizing, or transforming data to make it more useful, readable, or suitable for analysis. It involves tasks like cleaning, filtering, sorting, grouping, or calculating new values from existing data.

Tools like Pandas and NumPy, popular Python libraries, are often used to streamline these tasks. Pandas excels at handling structured data, like tables of crypto trades, for filtering or grouping, while NumPy supports fast numerical computations, such as calculating average prices or returns.

For instance, a crypto investor might manipulate data by filtering trades to show only Bitcoin transactions over $5,000, sorting them by timestamp to track price movements, and aggregating daily totals to assess trading volume. This helps uncover trends, optimize strategies, and ensure data accuracy, ultimately supporting better decisions in the fast-paced crypto market.

# Pandas

Pandas is a popular Python library used for working with data. It helps you load, clean, analyze, and manipulate data easily. Think of it like an Excel spreadsheet in Python! It’s great for handling tables of data.

# Key Concepts
Series: A single column of data (like a list with labels).

Index: Labels for rows, helping you identify and access data.

You can load data from files (CSV, Excel, etc.), manipulate it, and save it back.

### Installing / Importing Pandas

In [1]:
import pandas as pd

# Create a DataFrame

A DataFrame is like a table with rows and columns. Let’s create one with sample cryptocurrency data (e.g., coin names, prices, and trade volumes).

In [2]:
# Create a dictionary with crypto data

data = {
    'Coin': ['Bitcoin', 'Ethereum', 'Ripple', 'Litecoin'],
    'Price': [45000,3000, 0.85, 120],
    'Volume': [15000, " ", 200000, 50000]
}

df = pd.DataFrame(data)

print(df)

       Coin     Price  Volume
0   Bitcoin  45000.00   15000
1  Ethereum   3000.00        
2    Ripple      0.85  200000
3  Litecoin    120.00   50000


#### Explore Data

In [3]:
df

Unnamed: 0,Coin,Price,Volume
0,Bitcoin,45000.0,15000
1,Ethereum,3000.0,8000
2,Ripple,0.85,200000
3,Litecoin,120.0,50000


In [4]:
print(df.head(2))

       Coin    Price  Volume
0   Bitcoin  45000.0   15000
1  Ethereum   3000.0    8000


In [5]:
print(df.tail(2))

       Coin   Price  Volume
2    Ripple    0.85  200000
3  Litecoin  120.00   50000


In [7]:
df.head(2)

Unnamed: 0,Coin,Price,Volume
0,Bitcoin,45000.0,15000
1,Ethereum,3000.0,8000


## Get Info

In [13]:
print(df.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4 entries, 0 to 3
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   Coin    4 non-null      object 
 1   Price   4 non-null      float64
 2   Volume  4 non-null      object 
dtypes: float64(1), object(2)
memory usage: 224.0+ bytes
None


### Basic Data Manipulation

In [14]:
# select a specific column

print(df[['Coin', 'Price']])

       Coin     Price
0   Bitcoin  45000.00
1  Ethereum   3000.00
2    Ripple      0.85
3  Litecoin    120.00


### Filter Rows

In [16]:
print(df[df['Price'] > 100])

       Coin    Price Volume
0   Bitcoin  45000.0  15000
1  Ethereum   3000.0       
3  Litecoin    120.0  50000


### Sort Data

Sort by a column like Price in ascending or descending order

In [None]:
print(df.sort_values('Price')) # ascending order 

       Coin     Price  Volume
2    Ripple      0.85  200000
3  Litecoin    120.00   50000
1  Ethereum   3000.00        
0   Bitcoin  45000.00   15000


In [None]:
print(df.sort_values('Price', ascending=False)) # Sort in descending order

       Coin     Price  Volume
0   Bitcoin  45000.00   15000
1  Ethereum   3000.00        
3  Litecoin    120.00   50000
2    Ripple      0.85  200000


## Add new column

In [21]:
df['Price_after_2Pct'] = df['Price'] * 1.02
print(df)

       Coin     Price  Volume  Price_after_2Pct
0   Bitcoin  45000.00   15000         45900.000
1  Ethereum   3000.00                  3060.000
2    Ripple      0.85  200000             0.867
3  Litecoin    120.00   50000           122.400


In [22]:
df.describe()

Unnamed: 0,Price,Price_after_2Pct
count,4.0,4.0
mean,12030.2125,12270.81675
std,22023.550649,22464.021662
min,0.85,0.867
25%,90.2125,92.01675
50%,1560.0,1591.2
75%,13500.0,13770.0
max,45000.0,45900.0


In [24]:
df.to_csv('crypto_data.csv', index=False)

# Numpy 

NumPy is a powerful Python library for numerical computations, especially useful for working with arrays (lists of numbers) and performing fast mathematical operations. It’s a foundation for data manipulation, often used alongside Pandas, and is great for tasks like calculations, statistics, and handling crypto-related data. Below, I’ll walk you through the basics step by step—installing NumPy, creating arrays, and performing common operations with simple code. This is beginner-friendly and straightforward.

#### Installing and Importing Numpy

In [4]:
import numpy as np

### Create a Numpy Array

A NumPy array is like a list but optimized for math and faster operations. Let’s create arrays with sample cryptocurrency data (e.g., coin prices and trade volumes).

In [5]:
prices = np.array([45000, 50000, 40000, 20000])
print(prices)


[45000 50000 40000 20000]


In [6]:
print(prices.shape)

(4,)


Data Type

In [7]:
print(prices.dtype)

int64


### Basic Data Manipulation

In [None]:
# Add 1000 to all prices

prices = np.array([45000, 50000, 40000, 20000])

prices_plus_100 = prices + 1000

print(prices_plus_100)

[46000 51000 41000 21000]


In [8]:
# Compute for 2% increase in price 

price_s = np.array([46000, 51000, 41000, 21000])

price_increase = prices * 1.02

print(price_increase)



[45900. 51000. 40800. 20400.]


In [32]:
# filter data

higher_price = price_s[price_s > 20000 ]
print(higher_price)

[46000 51000 41000 21000]


## Basic Statistical Functions

In [34]:
volume = np.array([500000, 300000, 200000, 100000])

print("\nStatistics for Volume: ")

print("Mean (Average): ", np.mean(volume)) # Average Volume
print("Minimum: ", np.min(volume)) # Minimum volume
print("Maximum: ", np.max(volume)) # Maximum Volume
print("Sum: ", np.sum(volume)) # Total sum pf volumes
print("Standard Dev (Volatility): ", np.std(volume))


Statistics for Volume: 
Mean (Average):  275000.0
Minimum:  100000
Maximum:  500000
Sum:  1100000
Standard Dev (Volatility):  147901.9945774904


### Work with 2D Arrays

NumPy can handle multi-dimensional arrays. Let’s combine our data into a 2D array (like a table).

In [None]:
# Create 2D array

crypto_data = np.array([
    [45000, 5000],
    [30000, 28000],
    [11998, 8943]
])


print(crypto_data)

[[45000  5000]
 [30000 28000]
 [11998  8943]]


In [38]:
# Access Row

crypto_data[2]

array([11998,  8943])

In [40]:
# Access Column

crypto_data[:,1]

array([ 5000, 28000,  8943])