## Data Manipulation with Pandas and NumPy
Data manipulation is the process of changing, organizing, or transforming data to make it more useful, readable, or suitable for analysis. It involves tasks like cleaning, filtering, sorting, grouping, or calculating new values from existing data.

Tools like Pandas and NumPy, popular Python libraries, are often used to streamline these tasks. Pandas excels at handling structured data, like tables of crypto trades, for filtering or grouping, while NumPy supports fast numerical computations, such as calculating average prices or returns.

For instance, a crypto investor might manipulate data by filtering trades to show only Bitcoin transactions over $5,000, sorting them by timestamp to track price movements, and aggregating daily totals to assess trading volume. This helps uncover trends, optimize strategies, and ensure data accuracy, ultimately supporting better decisions in the fast-paced crypto market.



## Pandas
Pandas is a popular Python library used for working with data. It helps you load, clean, analyze, and manipulate data easily. Think of it like an Excel spreadsheet in Python! It’s great for handling tables of data.

## Key Concepts
Series: A single column of data (like a list with labels).

Index: Labels for rows, helping you identify and access data.

You can load data from files (CSV, Excel, etc.), manipulate it, and save it back.

## Installing/Importing Pandas



In [7]:
import pandas as pd

## Create a DataFrame
A DataFrame is like a table with rows and columns. Let’s create one with sample cryptocurrency data (e.g., coin names, prices, and trade volumes).

In [None]:


data = {
    'Coin': ['Bitcoin', 'Ethereum', 'Ripple', 'Litecoin'],
    'Price': [45000, 3000, 0.85, 120],
    'Volume': [15000, 8000, 200000, 50000]
}

df = pd.DataFrame(data) # 
print(df)

       Coin     Price  Volume
0   Bitcoin  45000.00   15000
1  Ethereum   3000.00    8000
2    Ripple      0.85  200000
3  Litecoin    120.00   50000


## Explore Data


In [14]:
# get the first 2 rows

print(df.head(2))



       Coin    Price  Volume
0   Bitcoin  45000.0   15000
1  Ethereum   3000.0    8000


In [15]:
# get the last 2 rows

print(df.tail(2))



       Coin   Price  Volume
2    Ripple    0.85  200000
3  Litecoin  120.00   50000


In [None]:
# get info about a table

print(df.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4 entries, 0 to 3
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   Coin    4 non-null      object 
 1   Price   4 non-null      float64
 2   Volume  4 non-null      int64  
dtypes: float64(1), int64(1), object(1)
memory usage: 228.0+ bytes
None


### Basic Data manipulation



In [20]:
# select a specific column, or more than on e colun

print(df[['Coin', 'Price']]) # square brackets are used []

       Coin     Price
0   Bitcoin  45000.00
1  Ethereum   3000.00
2    Ripple      0.85
3  Litecoin    120.00


In [25]:
print(df['Coin']) 

0     Bitcoin
1    Ethereum
2      Ripple
3    Litecoin
Name: Coin, dtype: object


### Filtering Rows



In [30]:
# get all rows where price > 100

print(df[df['Price'] > 100])


       Coin    Price  Volume
0   Bitcoin  45000.0   15000
1  Ethereum   3000.0    8000
3  Litecoin    120.0   50000


## Sort data

You can sort by column in ascending or descending order.

In [34]:
# sort price in ascending order

print(df.sort_values('Price')) # automatically sorts in ascending order

       Coin     Price  Volume
2    Ripple      0.85  200000
3  Litecoin    120.00   50000
1  Ethereum   3000.00    8000
0   Bitcoin  45000.00   15000


In [38]:
# sort price in descending order

print(df.sort_values('Price', ascending=False))

       Coin     Price  Volume
0   Bitcoin  45000.00   15000
1  Ethereum   3000.00    8000
3  Litecoin    120.00   50000
2    Ripple      0.85  200000


### Add New Column


In [39]:
# suppose there's a 2% increase in price, create a new column

df['price_after_2pct'] = df['Price'] * 1.02
print(df)

       Coin     Price  Volume  price_after_2pct
0   Bitcoin  45000.00   15000         45900.000
1  Ethereum   3000.00    8000          3060.000
2    Ripple      0.85  200000             0.867
3  Litecoin    120.00   50000           122.400


In [None]:
# suppose there's a 2% increase in price, create a new column

df['price_after_2pct'] = (df['Price'] * 0.02) + df['Price'] # another method
print(df)

       Coin     Price  Volume  price_after_2pct
0   Bitcoin  45000.00   15000         45900.000
1  Ethereum   3000.00    8000          3060.000
2    Ripple      0.85  200000             0.867
3  Litecoin    120.00   50000           122.400


In [43]:
df.describe() # usde to investigate

Unnamed: 0,Price,Volume,price_after_2pct
count,4.0,4.0,4.0
mean,12030.2125,68250.0,12270.81675
std,22023.550649,89734.330108,22464.021662
min,0.85,8000.0,0.867
25%,90.2125,13250.0,92.01675
50%,1560.0,32500.0,1591.2
75%,13500.0,87500.0,13770.0
max,45000.0,200000.0,45900.0


In [44]:
# save and load data

df.to_csv('crypto_data.csv', index=False)

## Numpy

NumPy is a powerful Python library for numerical computations, especially useful for working with arrays (lists of numbers) and performing fast mathematical operations. It’s a foundation for data manipulation, often used alongside Pandas, and is great for tasks like calculations, statistics, and handling crypto-related data. Below, I’ll walk you through the basics step by step—installing NumPy, creating arrays, and performing common operations with simple code. This is beginner-friendly and straightforward.

### Installing and Importing Numpy


In [45]:
import numpy as np

### Create a Numpy Array

A NumPy array is like a list but optimized for math and faster operations. Let’s create arrays with sample cryptocurrency data (e.g., coin prices and trade volumes).

In [47]:
prices = np.array([45000, 50000, 40000, 20000])
print(prices)

[45000 50000 40000 20000]


In [48]:
# length of the array

prices = np.array([45000, 50000, 40000, 20000])
print(prices.shape)

(4,)


### Datatype check


In [49]:
print(prices.dtype)

int64


### Basic Data Manipulation


In [51]:
# add 1000 to all prices

prices = np.array([45000, 50000, 40000, 20000])
prices_plus_1000 = prices + 1000
print(prices_plus_1000)

[46000 51000 41000 21000]


In [52]:
prices = np.array([45000, 50000, 40000, 20000])
price_increase = prices + 1.02

print(price_increase)


[45001.02 50001.02 40001.02 20001.02]


In [53]:
# filter data

higher_price = prices[prices > 20000] 
print(higher_price)

[45000 50000 40000]


## Basic Statistical Functions

In [66]:
volume = np.array([500000, 300000, 200000, 100000])
print("\nStatistics for volume: ") # Creates a heading "Statistics of volume, and gets to the next line"

print("Mean (Average): ", np.mean(volume)) # Average Volume
print("Maximum: ", np.max(volume)) # Highest Volume
print("Minimum: ", np.min(volume)) # Lowest Volume
print("Sum: ", np.sum(volume)) # Total sum of  Volume
print("Volatility (Standard Deviation): ", np.std(volume)) # standard dev of volume




Statistics for volume: 
Mean (Average):  275000.0
Maximum:  500000
Minimum:  100000
Sum:  1100000
Volatility (Standard Deviation):  147901.9945774904


### Work with 2D Arrays

NumPy can handle multi-dimensional arrays. Let’s combine our data into a 2D array (like a table).

In [67]:
crypto_data = np.array([
    [45000, 5000],
    [30000, 28000],
    [11998, 8943]
])

print(crypto_data)

[[45000  5000]
 [30000 28000]
 [11998  8943]]


In [70]:
# access the 2nd row. remember it starts from 0,1,2 etc

crypto_data[1]

array([30000, 28000])

In [72]:
# access the 1st column

crypto_data[:,0]

array([45000, 30000, 11998])

In [75]:
# access the 2nd column

crypto_data[:,1]

array([ 5000, 28000,  8943])