### Data Manipulation with Pandas and NumPy
#### Data manipulation is the process of changing, organizing, or transforming data to make it more useful, readable, or suitable for analysis. It involves tasks like cleaning, filtering, sorting, grouping, or calculating new values from existing data.

#### Tools like Pandas and NumPy, popular Python libraries, are often used to streamline these tasks. Pandas excels at handling structured data, like tables of crypto trades, for filtering or grouping, while NumPy supports fast numerical computations, such as calculating average prices or returns.

#### For instance, a crypto investor might manipulate data by filtering trades to show only Bitcoin transactions over $5,000, sorting them by timestamp to track price movements, and aggregating daily totals to assess trading volume. This helps uncover trends, optimize strategies, and ensure data accuracy, ultimately supporting better decisions in the fast-paced crypto market.

### Pandas
#### Pandas is a popular Python library used for working with data. It helps you load, clean, analyze, and manipulate data easily. Think of it like an Excel spreadsheet in Python! It’s great for handling tables of data.

### Key Concepts
#### Series: A single column of data (like a list with labels).

#### Index: Labels for rows, helping you identify and access data.

#### You can load data from files (CSV, Excel, etc.), manipulate it, and save it back.



#### Installing / Importing Pandas

In [1]:
import pandas as pd

In [2]:
!where python  # Windows


c:\Users\g\OneDrive\Documents\OxPython\Module\PythonTut\Scripts\python.exe
C:\Users\g\AppData\Local\Programs\Python\Python313\python.exe
C:\Users\g\AppData\Local\Microsoft\WindowsApps\python.exe


INFO: Could not find "#".
INFO: Could not find "Windows".


In [2]:
!c:\Users\g\Desktop\OxPython\Module_1\PythonTut\Scripts\python.exe -m pip install pandas


The system cannot find the path specified.


In [1]:
import pandas as pd
print(pd.__version__)


2.3.1


### Create a DataFrame
#### A DataFrame is like a table with rows and columns. Let’s create one with sample cryptocurrency data (e.g., coin names, prices, and trade volumes).


In [5]:
# Create a dictionary with crypto data

data = {
    'Coin': ['Bitcoin', 'Ethereum', 'Ripple', 'Litecoin'],
    'Price': [45000,3000, 0.85, 120],
    'Volume': [15000, ' ', 200000, 50000]
}

df = pd.DataFrame(data)

print(df)

       Coin     Price  Volume
0   Bitcoin  45000.00   15000
1  Ethereum   3000.00        
2    Ripple      0.85  200000
3  Litecoin    120.00   50000


In [6]:
print(df.head(2))

       Coin    Price Volume
0   Bitcoin  45000.0  15000
1  Ethereum   3000.0       


In [7]:
print(df.tail(3))

       Coin    Price  Volume
1  Ethereum  3000.00        
2    Ripple     0.85  200000
3  Litecoin   120.00   50000


In [8]:
# get info
print(df.info()) 

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4 entries, 0 to 3
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   Coin    4 non-null      object 
 1   Price   4 non-null      float64
 2   Volume  4 non-null      object 
dtypes: float64(1), object(2)
memory usage: 228.0+ bytes
None


### Basic Data Manipulation

In [9]:

# select a specific column

print(df[['Coin', 'Price']])

       Coin     Price
0   Bitcoin  45000.00
1  Ethereum   3000.00
2    Ripple      0.85
3  Litecoin    120.00


### Filter Rows

In [10]:
# filter out coin information where price is greater than 100
print(df[df['Price']>100])

       Coin    Price Volume
0   Bitcoin  45000.0  15000
1  Ethereum   3000.0       
3  Litecoin    120.0  50000


### Sort Data
#### Sort by a column like price in ascending or descending order

In [11]:
# Sort coin info by price's ascending order
print(df.sort_values('Price'))

       Coin     Price  Volume
2    Ripple      0.85  200000
3  Litecoin    120.00   50000
1  Ethereum   3000.00        
0   Bitcoin  45000.00   15000


In [12]:
# Sort coin info by price's descending order
print(df.sort_values('Price', ascending = False))

       Coin     Price  Volume
0   Bitcoin  45000.00   15000
1  Ethereum   3000.00        
3  Litecoin    120.00   50000
2    Ripple      0.85  200000


### Add new column

In [13]:
df['price_after_2%'] = df['Price'] * 1.02
print(df)

       Coin     Price  Volume  price_after_2%
0   Bitcoin  45000.00   15000       45900.000
1  Ethereum   3000.00                3060.000
2    Ripple      0.85  200000           0.867
3  Litecoin    120.00   50000         122.400


In [14]:
df.describe

<bound method NDFrame.describe of        Coin     Price  Volume  price_after_2%
0   Bitcoin  45000.00   15000       45900.000
1  Ethereum   3000.00                3060.000
2    Ripple      0.85  200000           0.867
3  Litecoin    120.00   50000         122.400>

In [15]:
df.describe()

Unnamed: 0,Price,price_after_2%
count,4.0,4.0
mean,12030.2125,12270.81675
std,22023.550649,22464.021662
min,0.85,0.867
25%,90.2125,92.01675
50%,1560.0,1591.2
75%,13500.0,13770.0
max,45000.0,45900.0


In [16]:
df.to_csv('CryptoData.csv', index=False)

### Numpy
####  NumPy is a powerful Python library for numerical computations, especially useful for working with arrays (lists of numbers) and performing fast mathematical operations. It’s a foundation for data manipulation, often used alongside Pandas, and is great for tasks like calculations, statistics, and handling crypto-related data. Below, I’ll walk you through the basics step by step—installing NumPy, creating arrays, and performing common operations with simple code. This is beginner-friendly and straightforward.

#### Installing and Importing Numpy

In [None]:
import numpy as np

### Create a Numpy Array
#### A NumPy array is like a list but optimized for math and faster operations. Let’s create arrays with sample cryptocurrency data (e.g., coin prices and trade volumes).

In [18]:
prices = np.array([ 4000, 23000, 42990, 90200])
print(prices)

[ 4000 23000 42990 90200]


In [19]:
# prints the shape (i.e., number of rows and columns) of a pandas DataFrame or a NumPy array.
print(prices.shape)

(4,)


In [20]:
print(prices.dtype)



int64


In [28]:
print(prices)

[ 4000 23000 42990 90200]


### Basic Data Manipulation

In [21]:
# Add 1000 to all prices
price_plus_100 = prices + 1000
print(price_plus_100)

[ 5000 24000 43990 91200]


In [22]:
pricesS = np.array([ 5000, 24000, 43990, 91200])
PricesS_Multi = pricesS + (pricesS*(50)/100)
print(PricesS_Multi)

[  7500.  36000.  65985. 136800.]


In [23]:
# Filter data
Prices_Higher = PricesS_Multi[PricesS_Multi > 30000]
print(Prices_Higher)

[ 36000.  65985. 136800.]


### Basic Statistical Functions

In [24]:
volume = np.array([40000,70000,50000,65000])
print('\nStatisctic Summary')
print('Mean:', np.mean(volume))
print('Maximum:', np.max(volume))
print('Mininum:', np.min(volume))
print('Total:', np.sum(volume))
print('volatility:', np.std(volume))


Statisctic Summary
Mean: 56250.0
Maximum: 70000
Mininum: 40000
Total: 225000
volatility: 11924.24001771182


### Work with 2D Arrays
#### NumPy can handle multi-dimensional arrays. Let’s combine our data into a 2D array (like a table).

In [25]:
# create 2D array
crypto_data = np.array ([
    [60000, 30000],
    [40000, 10000],
    [70000, 340000]
])
print(crypto_data)

[[ 60000  30000]
 [ 40000  10000]
 [ 70000 340000]]


In [26]:
# Access row
crypto_data[2]

array([ 70000, 340000])

In [27]:
# Access column
crypto_data[:,1]

array([ 30000,  10000, 340000])