In [277]:
import sys
sys.executable

'c:\\users\\karishma\\appdata\\local\\programs\\python\\python39\\python.exe'

# Ickle
*Data Analysis Library for Python*
<hr />

This Jupyter Notebook serves as the documentation for Ickle.

You can contribute to Ickle here: https://github.com/karishmashuklaa/ickle 

## Table Of Contents
1. [Getting Started](#Getting-Started)
2. [DataFrame and Visual Representation](#DataFrame-and-Visual-Representation)
3. [Basic Properties](#Basic-Properties)
4. [Selection of Subsets](#Selection-of-Subsets)
5. [Basic and Aggregation Methods](#Basic-And-Aggregation-Methods)
6. [Non-Aggregation Methods](#Non-Aggregation-Methods)
7. [Other Methods](#Other-Methods)
8. [Arithmetic and Comparison Operators](#Arithmetic-And-Comparison-Operators)
9. [String-Only Methods](#String-Only-Methods)
10. [Pivot Table](#Pivot-Table)
11. [Read CSV](#Read-CSV)

## Getting Started

### Installation

Ickle can be installed via pip.

`pip install ickle`

### Import

`import ickle as ick`

## DataFrame and Visual Representation

### DataFrame
A `DataFrame` holds two dimensional heterogenous data. It accepts dictionary as input, with Numpy arrays as values and strings as column names.

Parameters:
- `data`: A dictionary of strings mapped to Numpy arrays. The key will become the column name.

In [278]:
import numpy as np
import ickle as ick

In [279]:
name = np.array(['John', 'Sam', 'Tina', 'Josh', 'Jack', 'Jill'])
place = np.array(['Kolkata', 'Mumbai', 'Delhi', 'Mumbai', 'Mumbai', 'Mumbai'])
weight = np.array([57, 70, 54, 59, 62, 70])
married = np.array([True, False, True, False, False, False])

data = {'name': name, 'place': place, 'weight': weight, 'married': married}
df = ick.DataFrame(data)

### Visual Representation

`DataFrame` can be displayed in the following manner

In [280]:
df

Unnamed: 0,name,place,weight,married
0,John,Kolkata,57,True
1,Sam,Mumbai,70,False
2,Tina,Delhi,54,True
3,Josh,Mumbai,59,False
4,Jack,Mumbai,62,False
5,Jill,Mumbai,70,False


We will use the above `DataFrame` throughout the notebook

## Basic Properties
1. [len](#len)
2. [columns](#columns)
3. [shape](#shape)
4. [values](#values)
5. [dtypes](#dtypes)


### `len`
returns: the number of rows in the `DataFrame`

In [281]:
len(df)

6

### `columns`
returns: list of column names

In [282]:
df.columns

['name', 'place', 'weight', 'married']

### Modify exisiting column names

In [283]:
df.columns = ['NAME', 'PLACE', 'WEIGHT', 'MARRIED']
df

Unnamed: 0,NAME,PLACE,WEIGHT,MARRIED
0,John,Kolkata,57,True
1,Sam,Mumbai,70,False
2,Tina,Delhi,54,True
3,Josh,Mumbai,59,False
4,Jack,Mumbai,62,False
5,Jill,Mumbai,70,False


### `shape` 
returns: two-item tuple of number of rows and columns in the DataFrame

In [284]:
df.shape

(6, 4)

### `values`
returns: a single 2D NumPy array of all the columns of data.

In [285]:
df.values

array([['John', 'Kolkata', 57, True],
       ['Sam', 'Mumbai', 70, False],
       ['Tina', 'Delhi', 54, True],
       ['Josh', 'Mumbai', 59, False],
       ['Jack', 'Mumbai', 62, False],
       ['Jill', 'Mumbai', 70, False]], dtype=object)

### `dtypes`
returns: a two-column `DataFrame` of column names in one column and their data type in the other

In [286]:
df.dtypes

Unnamed: 0,Column Name,Data Type
0,NAME,string
1,PLACE,string
2,WEIGHT,int
3,MARRIED,bool


## Selection of Subsets
1. [Select a single column](#Select-a-single-column)
2. [Select multiple columns](#Select-multiple-columns)
3. [Boolean selection](#Boolean-selection)
4. [Simultaneuous selection of row and column](#Simultaneuous-selection-of-row-and-column)
6. [Add new / Overwrite existing columns](#Add-new-/-Overwrite-existing-columns)

### Select a single column
by passing the name of column as a string

In [287]:
df['NAME']

Unnamed: 0,NAME
0,John
1,Sam
2,Tina
3,Josh
4,Jack
5,Jill


### Select multiple columns 
by passing column names as a list of strings

In [288]:
df[['NAME', 'PLACE']]

Unnamed: 0,NAME,PLACE
0,John,Kolkata
1,Sam,Mumbai
2,Tina,Delhi
3,Josh,Mumbai
4,Jack,Mumbai
5,Jill,Mumbai


### Boolean Selection

In [289]:
bool_sel = df['WEIGHT'] > 60

df[bool_sel]

Unnamed: 0,NAME,PLACE,WEIGHT,MARRIED
0,Sam,Mumbai,70,False
1,Jack,Mumbai,62,False
2,Jill,Mumbai,70,False


### Simultaneuous selection of row and column
df[row, col]

In [290]:
df[0,2]

Unnamed: 0,WEIGHT
0,57


### Select columns as strings

In [291]:
df[0, 'WEIGHT']

Unnamed: 0,WEIGHT
0,57


### Select rows as slices

In [292]:
df[:1, 'WEIGHT']

Unnamed: 0,WEIGHT
0,57


### Select rows as booleans and lists

In [293]:
bool_row = df['MARRIED']

In [294]:
df[bool_row, 'WEIGHT']

Unnamed: 0,WEIGHT
0,57
1,54


### Add new / Overwrite existing columns

In [295]:
df['AGE'] = np.array([21, 41, 22, 42, 32, 25])
df

Unnamed: 0,NAME,PLACE,WEIGHT,MARRIED,AGE
0,John,Kolkata,57,True,21
1,Sam,Mumbai,70,False,41
2,Tina,Delhi,54,True,22
3,Josh,Mumbai,59,False,42
4,Jack,Mumbai,62,False,32
5,Jill,Mumbai,70,False,25


## Basic And Aggregation Methods
Basic Methods:
1. [head()](#head())
2. [tail()](#tail())

Aggregation Methods:
1. [min()](#min())
2. [max()](#max())
3. [mean()](#mean())
4. [median()](#median())
5. [sum()](#sum())
6. [var()](#var())
7. [std()](#std())
8. [all()](#all())
9. [any()](#any())
10. [argmax()](#argmax())
11. [argmin()](#argmin())

### head(n)
returns: the first n rows. By default n=5 

In [296]:
df.head()

Unnamed: 0,NAME,PLACE,WEIGHT,MARRIED,AGE
0,John,Kolkata,57,True,21
1,Sam,Mumbai,70,False,41
2,Tina,Delhi,54,True,22
3,Josh,Mumbai,59,False,42
4,Jack,Mumbai,62,False,32


### tail(n)
return the last n rows. By default n=5

In [297]:
df.tail()

Unnamed: 0,NAME,PLACE,WEIGHT,MARRIED,AGE
0,Sam,Mumbai,70,False,41
1,Tina,Delhi,54,True,22
2,Josh,Mumbai,59,False,42
3,Jack,Mumbai,62,False,32
4,Jill,Mumbai,70,False,25


### Aggregation Methods

All aggregation methods are applied only to columns of the `DataFrame`

## min()

In [298]:
df.min()

Unnamed: 0,NAME,PLACE,WEIGHT,MARRIED,AGE
0,Jack,Delhi,54,False,21


## max()

In [299]:
df.max()

Unnamed: 0,NAME,PLACE,WEIGHT,MARRIED,AGE
0,Tina,Mumbai,70,True,42


## max()

In [300]:
df.median()

Unnamed: 0,WEIGHT,MARRIED,AGE
0,60.5,0.0,28.5


## mean()

In [301]:
df.mean()

Unnamed: 0,WEIGHT,MARRIED,AGE
0,62.0,0.333,30.5


## sum()

In [302]:
df.sum()

Unnamed: 0,NAME,PLACE,WEIGHT,MARRIED,AGE
0,JohnSamTinaJoshJackJill,KolkataMumbaiDelhiMumbaiMumbaiMumbai,372,2,183


## var()

In [303]:
df.var()

Unnamed: 0,WEIGHT,MARRIED,AGE
0,37.667,0.222,72.917


## std()

In [305]:
df.std()

Unnamed: 0,WEIGHT,MARRIED,AGE
0,6.137,0.471,8.539


## all()

In [307]:
df.all()

Unnamed: 0,NAME,PLACE,WEIGHT,MARRIED,AGE
0,Jill,Mumbai,True,False,True


## any()

In [309]:
df.any()

Unnamed: 0,NAME,PLACE,WEIGHT,MARRIED,AGE
0,John,Kolkata,True,True,True


## argmax()

In [311]:
df.argmax()

Unnamed: 0,NAME,PLACE,WEIGHT,MARRIED,AGE
0,2,1,1,0,3


## argmin()

In [313]:
df.argmin()

Unnamed: 0,NAME,PLACE,WEIGHT,MARRIED,AGE
0,4,2,2,1,0


## Other Methods 
1. [isna()]
2. [count()]
3. [unique()]
4. [nunique()]
5. [value_counts()]
6. [rename()]
7. [drop()]
8. [diff()]
9. [pct_change()]
10. [sort_values()]
11. [sample()]

In [314]:
df.isna()

Unnamed: 0,NAME,PLACE,WEIGHT,MARRIED,AGE
0,False,False,False,False,False
1,False,False,False,False,False
2,False,False,False,False,False
3,False,False,False,False,False
4,False,False,False,False,False
5,False,False,False,False,False


In [315]:
df.count()

Unnamed: 0,NAME,PLACE,WEIGHT,MARRIED,AGE
0,6,6,6,6,6


In [316]:
dfs = df.unique()
dfs[3]

Unnamed: 0,MARRIED
0,False
1,True


In [317]:
df.nunique()

Unnamed: 0,NAME,PLACE,WEIGHT,MARRIED,AGE
0,6,3,5,2,6


In [318]:
dfs = df.value_counts()
dfs[1]

Unnamed: 0,PLACE,count
0,Mumbai,4
1,Delhi,1
2,Kolkata,1


In [319]:
df.rename({'WEIGHT': 'WEIGHT (kg)'})

Unnamed: 0,NAME,PLACE,WEIGHT (kg),MARRIED,AGE
0,John,Kolkata,57,True,21
1,Sam,Mumbai,70,False,41
2,Tina,Delhi,54,True,22
3,Josh,Mumbai,59,False,42
4,Jack,Mumbai,62,False,32
5,Jill,Mumbai,70,False,25


In [320]:
df.drop('AGE')

Unnamed: 0,NAME,PLACE,WEIGHT,MARRIED
0,John,Kolkata,57,True
1,Sam,Mumbai,70,False
2,Tina,Delhi,54,True
3,Josh,Mumbai,59,False
4,Jack,Mumbai,62,False
5,Jill,Mumbai,70,False


## Non Aggregation Methods
1. [abs()]
2. [cummin()]
3. [cummax()]
4. [cumsum()]
5. [clip()]
6. [round()]
7. [copy()]

In [321]:
df.abs()

Unnamed: 0,NAME,PLACE,WEIGHT,MARRIED,AGE
0,John,Kolkata,57,True,21
1,Sam,Mumbai,70,False,41
2,Tina,Delhi,54,True,22
3,Josh,Mumbai,59,False,42
4,Jack,Mumbai,62,False,32
5,Jill,Mumbai,70,False,25


In [322]:
df.cummin()

Unnamed: 0,NAME,PLACE,WEIGHT,MARRIED,AGE
0,John,Kolkata,57,True,21
1,Sam,Mumbai,57,False,21
2,Tina,Delhi,54,False,21
3,Josh,Mumbai,54,False,21
4,Jack,Mumbai,54,False,21
5,Jill,Mumbai,54,False,21


In [323]:
df.cummax()

Unnamed: 0,NAME,PLACE,WEIGHT,MARRIED,AGE
0,John,Kolkata,57,True,21
1,Sam,Mumbai,70,True,41
2,Tina,Delhi,70,True,41
3,Josh,Mumbai,70,True,42
4,Jack,Mumbai,70,True,42
5,Jill,Mumbai,70,True,42


In [324]:
df.cumsum()

Unnamed: 0,NAME,PLACE,WEIGHT,MARRIED,AGE
0,John,Kolkata,57,1,21
1,Sam,Mumbai,127,1,62
2,Tina,Delhi,181,2,84
3,Josh,Mumbai,240,2,126
4,Jack,Mumbai,302,2,158
5,Jill,Mumbai,372,2,183


In [325]:
df.clip(lower=55, upper=60)

Unnamed: 0,NAME,PLACE,WEIGHT,MARRIED,AGE
0,John,Kolkata,57,55,55
1,Sam,Mumbai,60,55,55
2,Tina,Delhi,55,55,55
3,Josh,Mumbai,59,55,55
4,Jack,Mumbai,60,55,55
5,Jill,Mumbai,60,55,55


In [326]:
df.round(n=1)

Unnamed: 0,NAME,PLACE,WEIGHT,MARRIED,AGE
0,John,Kolkata,57,True,21
1,Sam,Mumbai,70,False,41
2,Tina,Delhi,54,True,22
3,Josh,Mumbai,59,False,42
4,Jack,Mumbai,62,False,32
5,Jill,Mumbai,70,False,25


In [327]:
df.copy()

Unnamed: 0,NAME,PLACE,WEIGHT,MARRIED,AGE
0,John,Kolkata,57,True,21
1,Sam,Mumbai,70,False,41
2,Tina,Delhi,54,True,22
3,Josh,Mumbai,59,False,42
4,Jack,Mumbai,62,False,32
5,Jill,Mumbai,70,False,25


In [328]:
df.diff(n=1)

Unnamed: 0,NAME,PLACE,WEIGHT,MARRIED,AGE
0,John,Kolkata,,,
1,Sam,Mumbai,13.0,-1.0,20.0
2,Tina,Delhi,-16.0,1.0,-19.0
3,Josh,Mumbai,5.0,-1.0,20.0
4,Jack,Mumbai,3.0,0.0,-10.0
5,Jill,Mumbai,8.0,0.0,-7.0


In [329]:
df.pct_change(n=1)

Unnamed: 0,NAME,PLACE,WEIGHT,MARRIED,AGE
0,John,Kolkata,,,
1,Sam,Mumbai,0.228,-1.0,0.952
2,Tina,Delhi,-0.229,inf,-0.463
3,Josh,Mumbai,0.093,-1.0,0.909
4,Jack,Mumbai,0.051,,-0.238
5,Jill,Mumbai,0.129,,-0.219


## Arithmetic And Comparison Operators 
1. [Addition](#Addition)
2. [Subtraction](#Subtraction)
3. [Multiplication](#Multiplication)
4. [Division](#Division)
5. [Floor Division](#Floor-Division)
6. [Power](#Power)
7. [Greater than](#Greater-than)
8. [Less than](#Less-than)
9. [Greater than equal to](#Greater-than-equal)
10. [Lesser than equal to](#Lesser-than-equal)
11. [Not Equal](#Not-Equal)
12. [Equal](#Equal)

*Arithemtic and Comparison Operators only work with numerical columns*

In [330]:
df_op = df['WEIGHT']

### Addition

In [331]:
df_op + 2

Unnamed: 0,WEIGHT
0,59
1,72
2,56
3,61
4,64
5,72


In [332]:
2 + df_op

Unnamed: 0,WEIGHT
0,59
1,72
2,56
3,61
4,64
5,72


### Subtraction

In [333]:
df_op - 2

Unnamed: 0,WEIGHT
0,55
1,68
2,52
3,57
4,60
5,68


In [334]:
2 - df_op

Unnamed: 0,WEIGHT
0,-55
1,-68
2,-52
3,-57
4,-60
5,-68


### Multiplication 

In [335]:
df_op * 2

Unnamed: 0,WEIGHT
0,114
1,140
2,108
3,118
4,124
5,140


In [336]:
2 * df_op

Unnamed: 0,WEIGHT
0,114
1,140
2,108
3,118
4,124
5,140


### Division 

In [337]:
df_op / 2

Unnamed: 0,WEIGHT
0,28.5
1,35.0
2,27.0
3,29.5
4,31.0
5,35.0


In [338]:
2 / df_op 

Unnamed: 0,WEIGHT
0,0.035
1,0.029
2,0.037
3,0.034
4,0.032
5,0.029


### Floor Division 

In [339]:
df_op // 2

Unnamed: 0,WEIGHT
0,28
1,35
2,27
3,29
4,31
5,35


In [340]:
2 // df_op

Unnamed: 0,WEIGHT
0,0
1,0
2,0
3,0
4,0
5,0


### Power 

In [341]:
df_op ** 3

Unnamed: 0,WEIGHT
0,185193
1,343000
2,157464
3,205379
4,238328
5,343000


In [342]:
3 ** df_op

Unnamed: 0,WEIGHT
0,886634019
1,-102221863
2,32838297
3,-610228421
4,703701817
5,-102221863


### Greater than

In [343]:
df_op > 50

Unnamed: 0,WEIGHT
0,True
1,True
2,True
3,True
4,True
5,True


### Less than

In [344]:
df_op < 55

Unnamed: 0,WEIGHT
0,False
1,False
2,True
3,False
4,False
5,False


### Greater than equal 

In [345]:
df_op >= 75

Unnamed: 0,WEIGHT
0,False
1,False
2,False
3,False
4,False
5,False


### Lesser than equal 

In [346]:
df_op <= 55

Unnamed: 0,WEIGHT
0,False
1,False
2,True
3,False
4,False
5,False


### Not Equal 

In [347]:
df_op != 55

Unnamed: 0,WEIGHT
0,True
1,True
2,True
3,True
4,True
5,True


### Equal 

In [348]:
df_op == 70

Unnamed: 0,WEIGHT
0,False
1,True
2,False
3,False
4,False
5,True


## String Only Methods 
All the strings behave in the same manner as built-in string functions in Python.
These methods can be used only with `columns`

1. [capitalize](#capitalize(col))

### capitalize(col)

In [353]:
df.str.capitalize('NAME')

Unnamed: 0,NAME
0,John
1,Sam
2,Tina
3,Josh
4,Jack
5,Jill


### center(col, width, fillchar=None)

In [391]:
df.str.center('NAME', 10, 'a')

Unnamed: 0,NAME
0,aaaJohnaaa
1,aaaSamaaaa
2,aaaTinaaaa
3,aaaJoshaaa
4,aaaJackaaa
5,aaaJillaaa


### count(col, sub, start=None, stop=None)

In [393]:
df.str.count('PLACE', 'Mumbai')

Unnamed: 0,PLACE
0,0
1,1
2,0
3,1
4,1
5,1


### endswith()

In [394]:
df.str.endswith('NAME', 'n')

Unnamed: 0,NAME
0,True
1,False
2,False
3,False
4,False
5,False


### startswith()

In [395]:
df.str.startswith('NAME', 'J')

Unnamed: 0,NAME
0,True
1,False
2,False
3,True
4,True
5,True


### find()

In [400]:
df.str.find('NAME', 'Tina')

Unnamed: 0,NAME
0,-1
1,-1
2,0
3,-1
4,-1
5,-1


### len()

In [361]:
df.str.len('NAME')

Unnamed: 0,NAME
0,4
1,3
2,4
3,4
4,4
5,4


### get()

In [403]:
df.str.get('NAME', 0)

Unnamed: 0,NAME
0,J
1,S
2,T
3,J
4,J
5,J


### index()

### isalnum()

In [442]:
df.str.isalnum('NAME')

Unnamed: 0,NAME
0,True
1,True
2,True
3,True
4,True
5,True


### isalpha()

In [439]:
df.str.isalpha('NAME')

Unnamed: 0,NAME
0,True
1,True
2,True
3,True
4,True
5,True


### isdecimal()

In [438]:
df.str.isdecimal('NAME')

Unnamed: 0,NAME
0,False
1,False
2,False
3,False
4,False
5,False


### isnumeric()

In [437]:
df.str.isnumeric('NAME')

Unnamed: 0,NAME
0,False
1,False
2,False
3,False
4,False
5,False


### isspace()

In [433]:
df.str.isspace('NAME')

Unnamed: 0,NAME
0,False
1,False
2,False
3,False
4,False
5,False


### istitle()

In [432]:
df.str.istitle('NAME')

Unnamed: 0,NAME
0,True
1,True
2,True
3,True
4,True
5,True


### lower()

In [431]:
df.str.islower('NAME')

Unnamed: 0,NAME
0,False
1,False
2,False
3,False
4,False
5,False


### isupper()

In [430]:
df.str.isupper('NAME')

Unnamed: 0,NAME
0,False
1,False
2,False
3,False
4,False
5,False


In [429]:
df.str.lstrip('NAME', 'o')

Unnamed: 0,NAME
0,John
1,Sam
2,Tina
3,Josh
4,Jack
5,Jill


In [428]:
df.str.rstrip('NAME', 'o')

Unnamed: 0,NAME
0,John
1,Sam
2,Tina
3,Josh
4,Jack
5,Jill


In [427]:
df.str.strip('NAME', 'o')

Unnamed: 0,NAME
0,John
1,Sam
2,Tina
3,Josh
4,Jack
5,Jill


### replace()

In [424]:
df.str.replace('NAME', 'John', 'Cena')

Unnamed: 0,NAME
0,Cena
1,Sam
2,Tina
3,Josh
4,Jack
5,Jill


### swapcase()

In [423]:
df.str.swapcase('NAME')

Unnamed: 0,NAME
0,jOHN
1,sAM
2,tINA
3,jOSH
4,jACK
5,jILL


### title()

In [422]:
df.str.title('NAME')

Unnamed: 0,NAME
0,John
1,Sam
2,Tina
3,Josh
4,Jack
5,Jill


### lower()

In [421]:
df.str.lower('NAME')

Unnamed: 0,NAME
0,john
1,sam
2,tina
3,josh
4,jack
5,jill


### upper()

In [420]:
df.str.upper('NAME')

Unnamed: 0,NAME
0,JOHN
1,SAM
2,TINA
3,JOSH
4,JACK
5,JILL


### zfill()

In [417]:
df.str.zfill('NAME', 10)

Unnamed: 0,NAME
0,000000John
1,0000000Sam
2,000000Tina
3,000000Josh
4,000000Jack
5,000000Jill


### encode()

In [415]:
df.str.encode('NAME')

Unnamed: 0,NAME
0,b'John'
1,b'Sam'
2,b'Tina'
3,b'Josh'
4,b'Jack'
5,b'Jill'


## Pivot Table
Creates a pivot table from one or two 'grouping' columns

Parameters

`rows`: str of column name to group by (Optional)

`columns`: str of column name to group by (Optional)

`values`: str of column name to aggregate (Required)

`aggfunc`: str of aggregation function

In [349]:
df.pivot_table(rows='NAME', columns='PLACE', values='WEIGHT', aggfunc='mean')

Unnamed: 0,NAME,Delhi,Kolkata,Mumbai
0,Jack,,,62.0
1,Jill,,,70.0
2,John,,57.0,
3,Josh,,,59.0
4,Sam,,,70.0
5,Tina,54.0,,


## Read CSV

read_csv(file: string of file location)

In [385]:
data = ick.read_csv('./dataset/employee.csv');
data.head()

Unnamed: 0,dept,race,gender,salary
0,Houston Police Department-HPD,White,Male,45279
1,Houston Fire Department (HFD),White,Male,63166
2,Houston Police Department-HPD,Black,Male,66614
3,Public Works & Engineering-PWE,Asian,Male,71680
4,Houston Airport System (HAS),White,Male,42390
