ARCHIVED

🔥🔥🔥
🔥🔥🔥
THIS REPOSITORY IS NOW ARCHIVED AND CONTINUE UPDATING AT
https://github.com/whitepawglobal/bite-size-python
🔥🔥🔥
🔥🔥🔥

Snippets of Code for Data Science Operations in Python

Environment Setup

Create environment (Only for the first time)

git clone https://github.com/codenamewei/pydata-science-playground.git
cd <path-to>/pydata-science-playground
conda env create -f config.yml

Activate environment

conda activate pyplayground

Package Installation

Install package with pip
pip install <package-name>. Example:pip install numpy

For more pip commands, check out pip guidelines document

Install package with conda

conda install <package>. Example: conda install numpy

For more conda commands, check out conda guidelines document

Bite-Size Python

Basic
Intermediate
Advanced
Software Development
Machine Learning
Medium Posts

Basic

Comment

Single Line Comment: //sample text

Multi Lines Comment:

 """
 Hello World!
 Nice to meet all of you cookie monsters!
 """

Boolean Operator

Maths

Define Nan, Infinite
Sum up an array: sum(arr)
Round up a number to a certain decimal point: round(value, 1)
Calculate percentile
Power of a number: pow(base_number, exponent_number
Square root of a number: sqrt(number)
Logarithm / Log
- Log to the base of 2:
  - Numpy: import numpy as np; np.log2(10)
  - Math: import math; math.log2(10)
  - Plotting of log to the to the base of 2

Data Types

Floating Value (float, double)

Format floating value to n decimal: "%.2f" % floating_var

Bytes

Notes:

Difference between bytes() and bytearray() is that bytes() returns an object that cannot be modified (immutable), 
and bytearray() returns an object that can be modified (mutable).

Numpy <> Bytes, Bytes <> Numpy
Bytes -> String: bytesobj.decode("utf-8")
String -> Bytes: strobj.encode("utf-8")
Bytes -> Multimedia file (video/audio))
Check bytes encoding
To Bytes: bytes(<value>)
Get size of bytes object: import sys;sys.getsizeof(bytesobject)
Split bytes to chunks

ByteArray

Notes:

Difference between bytes() and bytearray() is that bytes() returns an object that cannot be modified (immutable), 
and bytearray() returns an object that can be modified (mutable).

Integer to Bytearray
Native Array to Bytearray
Numpy Array to Bytearray
[Image as Bytearray](notebooks/cv/image_as bytearray.ipynb)
Check bytes array encoding
To ByteArray: bytearray(<value>)

Numpy

Numpy basic
Get numpy shape: nparray.shape
Numpy array to list: nparray.tolist()
Change datatype: nparray = nparray.astype(<dtype>) Example: nparray = nparray.astype("uint8")
Numpy NaN (Not A Number): Constant to act as a placeholder for any missing numerical values in the array: np.NaN / np.nan / np.NAN
Numpy multiply by a value: nparray = nparray * 255
Numpy array to image
Numpy <> Binary File(.npy)
Use of numpy.where

String

Generate string with parameter
- Using template literal: print(f'Completed part {id}')
- Generate string with templates
- String formatting method: print('Completed part {part_id}'.format(part_id))
- create string in the raw format: `varname="world"; print(f"Hello {varname!r}")
Check if string is empty, len = 0: if not strvar:
Check if string contains digit: any(chr.isdigit() for chr in str1) #return True if there's digit
Check file extension: notebooks/string/check_file_extension.ipynb
Capitalize a string: strvar.capitalize()
Uppercase a string: strvar.upper()
Lowercase a string: strvar.lower()
Get substring from a string: strvar[<begin-index>:<end-index>] / strvar[<begin-index>:] / strvar[:<end-index>]
Remove white spaces in the beginning and end: strvar.strip()
Swap existing upper and lower case: strvar.swapcase()
Capitalize every first letter of a word: strvar.title()
Splitting string:
- Split a string based on separator: strvar.split(separator) Example: strvar.split("x")
- Split on white space: strvar.split()
- If split with every character, do this instead: [*"ABCDE"] Result: ["A", "B", "C", "D", "E"]
Check if string starts with a substring: strvar.startswith(<substring>)
Check if string ends with a substring: strvar.endswith(<substring>)
Check if string have substring/specific character. Returns -1 if not found: strvar.find(<substring>)
String get substring with index: str[startindex:endindex]
Replace string/character with intended string/character: strout = strin.replace(" ", "_")
Replace multiple string/characters with intended string/character
Generate random string
List to string: <separators>.join(list) example: ', '.join(listbuffer)

Unique Identifer (UUID)

Datetime

datetime: datetime.ipynb
find differences of two datetime: use divmod

Data Structure

List

List of str to int: list(map(int, arr))
List with range of values: list(range(...))
Split str to list of str: arr.split(" ")
Check for empty list: if not mylist:
Find if a value in a list: if value in mylist: / if value not in mylist:
Sort an array in place: arr.sort() / Return a sorted array: sorted(arr)
Get index of a value: arr.index(value)
Add one more value to existing list: arr.append(value)
Extend list with values in another list: arr.extend(arr2)
Remove an item from the list: arr.remove(item)
Check for empty list: arr = []; if not arr: #empty list
Check all items in a list(subset) if exist in another list, returns boolean: set(b).issubset(v)
Build list of same values: ['100'] * 20 # 20 items of the value '100'
Change values of list with List Comprehension: [func(a) for a in sample_list]
Iteration of list with index: for index, value in enumerate(inlist):
Iteration over two lists: `[ for item1, item2 in zip(list1, list2)]```
Count occurence of items in list
Get maximum value in a list of numbers (even strings): max(samplelist)

Dictionary

Define dict with str keys
Add new key value pair: dict.update({"key2":"value2"})
Remove key<> value pair by referring to specific key
Get keys as list: list(lut.keys())
Get values as list: list(lut.values())
Create dict from list: {i: 0 for i in arr}
Handling missing items in dict
Iteration to dict to get keys and values
Save/load dictionary to/from a file
Revert or inverse a dictionary mapping: inv_map = {v: k for k, v in my_map.items()}
Copy by value: sampledict.copy()

Named Tuple

NamedTuple

Applicable to Python Iterables (List, Set,...)

To identify if any items in the iterables has True/1 values: any(sample_list) #returns single value True/False
Zip multiple iterables

Pandas

Panda Infos

Dataframe basic
- Get # rows and columns
- Get summary/infos about dataframe
Get data types
Dataframe/Series Min, Max, Median, General Description
Get rows name (index) and columns name (column)
Get a glimpse of dataframe
Get subset of a dataframe by rows/by columns
Get rows by finding matching values from a specific column
Check if a column name exist in dataframe - if 'code' in df.columns:
Iteration of each rows in a dataframe

Panda Operations

Check if dataframe is empty: df.empty #return boolean
Get dataframe from list

Build dataframe with columns name

column_list = ["a", "b"]
df = pd.DataFrame(columns = column_list)

Build a new dataframe from a subset of columns from another dataframe
Get subset of dataframe, sample columns with specific criteria
- Sample by percentage
- Sample by # of rows specified
- Sample by matching to a value
Column to list: df.columns.tolist()
Sample rows: df = df.sample(frac=1).reset_index(drop=True)
Referring to dataframe column by key or by string
Concatenate dataframe
- Concatenate by adding rows
Append string to all rows of a column
Reset index without creating new (index) column: df.reset_index(drop=True)
Assign df by copy instead of reference - df.copy()
Shuffle rows of df: df = df.sample(frac=1).reset_index(drop=True)
Pandas with multiple index

Panda Type

Panda Series

Series to value
Series/Dataframe to numpy array: input.to_numpy()
Series iteration: for index, item in seriesf.items():
Series to dict: seriesf.to_dict()

Panda Assign values

Panda Remove/drop values

Drop duplicates for df / subset, keep one copy and remove all
Remove/drop rows where specific column matched value
Remove specific columns with column name
Drop rows by index
Drop rows/columns with np.NaN: df3 = df3.dropna(axis = 1) #row

Panda SQL-like functions

pivot table: :TODO
- Drawback: Not able to do filtering selection
Merge two dataframes based on certain column values

Panda Filtering

Filter with function isin()
Filter df with item not in list
Filter with function query()
Find with loc
- df.loc[df['address'].eq('johndoe@gmail.com')] #filter with one value
- df.loc[df.a.eq(123) & df.b.eq("helloworld")] #filter with one value in multiple columns
- df.loc[df.a.isin(valuelist)] #filter with a few values in a list
Assign value to specific column(s) by matching value
Get a subset of dataframe by rows - df.iloc[<from_rows>:<to_rows>, :]
Count items and filter by counter values
Retrieve columns name which match specific str

Panda Excel In/Out

Read in excel with specific sheet name: pd.read_excel(<url>, sheet_name = "Sheet1", engine = "openpyxl")
- Note: Install engine by pip install openpyxl
Read number of sheets in excel
Save excel: df.to_excel('file_name', index = False)
Write to multiple sheets

Panda CSV In/Out

Read csv with other delimiter pd.read_csv(<path-to-file>, delimiter = '\x01')
Read csv with bad lines pd.read_csv(<path-to-file>, on_bad_lines='skip')
- Note: pd.read_csv(<path>, error_bad_lines = False) deprecated
Read csv with encoding pd.read_csv('file name', encoding = 'utf-8')
Save to csv df.to_csv('file name', index = False)
- Note: Put index = False is important to prevent an extra column of index being saved.
Save to csv with encoding df.to_csv('file name', encoding = 'utf-8')

Panda JSON In/Out

Panda Parquet In/Out

Read in parquet: pd.read_parquet(...)
Write to parquet: pd.to_parquet(...)

Panda Pickle In/Out

Note: Pickle have security risk and slow in serialization (even to csv and json). Dont use

Read in pickle to dataframe: df = pd.read_pickle(<file_name>) # ends with .pkl
Save to pickle: df.to_pickle(<file_name>)

Panda Dataframe Others

Random dataframe and database table generator

Random

Generate random integer within (min, max): from random import randint; randint(0, 100) #within 0 and 100
Generate random floating value: from random import random; random()
Randomly choosing an item out from a list: import random; random.choice([123, 456, 378])
Generate list with random number: import random; random.sample(range(10, 30), 5)
- Example shown where 5 random numbers are generated in between 10 to 30

Intermediate

Error Handling

Types of Built-In Exceptions

ValueError: argument of the correct data type but an inappropriate value
TypeError: the data type of an object is incorrect
IndexError: Raised when a sequence subscript is out of range
KeyError: When key cannot be found
ZeroDivisionError: when a number is divided by zero
OSError: error from an os-specific function
FileNotFoundError: when a file or directory is requested but doesn’t exist
NotImplementedError: commonly raised when an abstract method is not implemented in a derived class
NameError: reference to some name (variable, function, class) that hasn’t been defined
AttributeError: reference to certain attribute in a class which does not exist
ImportError: Trouble loading a module
- Submodule
  - ModuleNotFoundError: the module trying to import can’t be found or try to import something from a module that doesn’t exist in the module

File System

The character used by the operating system to separate pathname components: os.sep
Iterate through a path to get files/folders of all the subpaths
Write file: f.write(str)
print without new line: print(..., end="")
Get environment path (second param is optional): import os; os.getenv(<PATH_NAME> : str, <alternative-return-value>: str)
Flush out print
Check if path is a folder: os.path.isdir(<path>)
Get file size
- from pathlib import Path; outsize : int = Path(inputfilepath).stat().st_size
- import os; outsize : int = os.path.getsize(inputfilepath)
Create folder: os.mkdir(<path>
Create folders recursively: os.makedirs(<path>)
Get folder path out of given path with filename: os.path.dirname(<path-to-file>)
Expand home directory: os.path.expanduser('~')
Get current running script path: os.getcwd()
Get the list of all files and directories in the specified directory (does not expand to items in the child folder: os.listdir(<path>)
Get current file path (getcwd will point to the running script(main) path, this will get individually py path): os.path.dirname(os.path.abspath(__file__))
Get filename from path: os.path.basename(configfilepath)
Split extension from rest of path(Including .): filename, ext = os.path.splitext(path)
Append certain path: sys.path.append(<path>)
Check if path exist: os.path.exists(<path>)
Remove a file: os.remove()
Get size of current file in byte: os.path.getsize(<path>) or from pathlib import Path; Path(<path>).stat().st_size
Removes an empty directory: os.rmdir()
Deletes a directory and all its contents: shutil.rmtree()
Copy a file to another path
Unzip file
Readfile
```
open(<path-to-file>, mode)
```
- `r`: Open for text file for reading text - `w`: Open a text file for writing text - `a`: Open a text file for appending text - [`b`: Open to read/write as bytes](notebooks/cv/image_as_byte.ipynb) Read file has 3 functions
- read() or read(size): read all / size as one string.
- readline(): read a single line from a text file and return the line as a string.
- readlines(): read all the lines of the text file into a list of strings.
- write(<param> : str): write in param. Need to explicitly add \n to split line.
- .close(): close file iterator

System

Get system input
Check operating system: import platform; platform.system()
Check if port is open/close

Time

Measure time prior and after
Add delay to execution of the program by pausing: import time;time.sleep(seconds)
- Note: stops the execution of current thread only

Advanced

Class

Effective way to view object address and object
Reserved methods in class
The magic variable *args and **kwargs
Check if object is of specified type: isinstance(obj, MyClass) / isinstance(obj, (type1, type2) : tuple)
Deep Copy, Shallow Copy
- Copy list by value: list_cp = list_ori[:] (Note: list_cp = list_ori copy by reference)
Define dataclass
- dataclass 1
- dataclass 2
  - Magic methods __repr__ and __dict__ are created when define class with dataclass
Implement Enum in Python
Serialize class object

Magic Method

__dict__ return all attributes of an object(only those defined in init): obj.__dict__
__str__ return string representation of the obj: def __str__(self):
__eq__ compare the instances of the class: def __eq__(self, other):
- Define eq function in class 1
- Define eq function in class 2
__repr__: represent a class's objects as a string. Call object with repr(obj)

Regular Expression (Regex)

Find matching word/character 1
- Introduction of functions in re library
- Square brackets for upper and lower case [Ww]oodchuck
Find matching word/character 2
- Optional character with ?
- Optional 0 or more character with *
- Optional 1 or more character with +
- Any character with .
Find matching word/character 3
- Whitespace character find with \s
- Non-whitespace character find with \S
Find matching word/character 4
- Caret before square bracket:^[] to indicate beginning
- Dollar sign after square bracket:[]$ to indicate ending
Negation
Disjunction
- To match a series of patterns with parenthesis.
Extract hashtags
Extract numbers from string

Data Structure - Processing iterables with a functional style

Note: Functional style can be replaced with list comprehension or generator expressions

Inheritance

from abs import ABC
from abs import ABCMeta
Difference between importance ABC or ABCMeta
- TLDR: ABC is a wrapper of ABCMeta, both serves the purpose where former easy to write.

Logging

Built-In Logging

Basic:

import logging
logger = logging.getLogger(__name__)
logging.basicConfig(stream=sys.stdout, level=logging.INFO)

Advanced configuration log to stdout
Advanced configuration log to file
Log with variables: logging.error(f"Keys {a} is missing")
Log exception

Logging Others

Logging with module icecream

Design Patterns

Built-in Decorators

Class Method
Static Method
dataclass
- dataclass hello world
Abstract class with ABCMeta and @abstractmethod
Property Setting
@property to prevent setting value
1. Native Verbose Method
2. Using built-in property function
3. Using decorator
- getter: @property
- setter: @{variable}.setter
- deleter: @{variable}.deleter

Type Checking, Data Validation

Module typing: Type hint & annotations
- Dict
- List
- Tuple
- Set
- Any
- Union
Module pydantic: Data parsing and validation library :TODO

Others

Kill after x amount of time if process not complete

Networking

Get IP from domain name: import socket;socker.gethostbyname("www.google.com");

Concurrency

Built-in Concurrency Library: Asyncio

Simple example with asyncio

Hashing

Password hashing with library bcrypt - saltround

Web

Webhook

Software Development

REST

FastAPI

Requests

Get data from url

Database

Connect to db with sqlalchemy

PostgreSQL

Postgres connect to AWS RDS
Local Node
~~Save and load image between REST and Postgres~~ Obsolete: large files (including image) should be saved to storage
~~Save and load video between REST and Postgres~~ Obsolete: large files (including image) should be saved to storage

Cloud

AWS

Postgres connect to AWS RDS

S3: Scalable Storage

Note:

What is a bucket in S3

A bucket is a container for objects stored in Amazon S3 which can contains files and folders. You can store any number of objects in a bucket and can have up to 100 buckets in your account

Machine Learning

Pytorch

Check if cuda is available - import torch; torch.cuda.is_available()
Softmax

Torch Tensor

Torch Tensor Creation

Create tensor of zeros with shape like another tensor: torch.zeros_like(another_tensor)
Create tensor of zeros with shape (tuple): torch.zeros(shape_in_tuple)
Create tensor of ones with shape like another tensor: torch.ones_like(another_tensor)
Create tensor of ones with shape (tuple): torch.ones(shape_in_tuple)
Create tensor of random floating value between 0-1 with shape like another tensor:
torch.rand_like(another_tensor, dtype = torch.float)
Create tensor of random floating value between 0-1 with shape (tuple):
torch.rand(shape_in_tuple)

Torch Tensor Info Extraction

Given torch.tensor buffer = tensor(4), get the value by - id = buffer.item()
Given torch.tensor, get the argmax of each row - torch.argmax(buffer, dim=<(int)dimension_to_reduce>)
Tensor to cuda - inputs = inputs.to("cuda")
Tensor shape - tensor.shape
Tensor data types - tensor.dtype
Device tensor is stored on - tensor.device
Torch tensor(single value) to value: tensorarray.item()
Retrieve subset of torch tensor by row index: tensor[<row_number>, :] / tensor[<row_number_from>:<row_number_to>, :]
Retrieve subset of torch tensor by column index: tensor[:, <column_number_from>:<column_number_to>]

Torch Tensor Conversion

List to torch tensor - torch.tensor(listimp)
Numpy array to torch tensor - torch.from_numpy(np_array)
Image to torch tensor
Torch tensor to image

Torch Tensor Operation

Torch tensor value change by indexing and conditions
Concatenate tensor according to dimension (0 for adding rows, 1 for adding columns):
torch.cat([<tensor_1>, <tensor_2>, ...], dim = <dimension_number>

Dataset Loader, Iterator

torch.utils.data.DataLoader: stores the samples and their corresponding labels,
torch.utils.data.Dataset: wraps an iterable around the Dataset to enable easy access to the samples

Torch Tensor In/Out

Save torch tensor to file: torch.save(x : torch.tensor, tensorfile :str)
Load torch tensor from file: torch.load(tensorfile :str)

Torch Dataset

Image Datasets
- Fashion MNIST Torch
  
  Fashion-MNIST is a dataset of Zalando’s article images consisting of 60,000 training examples and 10,000 test examples. Each example comprises a 28×28 grayscale image and an associated label from one of 10 classes.
Text Datasets
Audio Datasets

Huggingface

Send model to cuda - model.to('cuda:0') or model.cuda()
Overview of DatasetDict
DatasetDict from Pandas Dataframe

Computer Vision

Computer Vision - Basic

Get image shape: img.shape
Create a color image: image = np.zeros((h,w,3), np.uint8)
Read/Write image:
Read image from url
Pause to display image or wait for an input: cv2.waitKey(0)
Save an image: cv2.imwrite(pathtoimg : str, img : numpy.ndarray)
Show an image in window: cv2.imshow(windowname : str, frame : np.array)

Show an image in Jupyter notebok

from IPython.display import Image
Image(filename=pathtoimg : str)

Flip image: frame = cv2.flip(frame, flipcode : int)
- Positive flip code for flip on y axis (left right flip)
- 0 for flip on x axis (up down)
- Negative for flipping around both axes

Computer Vision - Intermediate

Computer Vision - Filter

Blur with averaging mask: cv2.blur(img,(5,5))
GaussianBlur: blur = cv2.GaussianBlur(img,(5,5),0)
- Note: Kernel size (5, 5) to be positive and odd. Read more here on how kernel size influence the degree of blurring.
Blurring region of image

Computer Vision - Video Stream

Concat multiple video streams to show side by side: 2 video streams 3 video streams
Save stream to video output
Read in video stream from a file
Read in stream from camera
video arrays (in opencv) -> bytes -> np.array -> video arrays (in opencv)
Merge audio with video
Check if video comes with audio
Split audio from video

Computer Vision - Other

Overlay image
Resizing frame: outframe = cv2.resize(frame, (w, h))
Set color to rectangle region
Color to gray image: gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
Remove background
Add channel to image

Name		Name	Last commit message	Last commit date
Latest commit History 723 Commits
.vscode		.vscode
metadata		metadata
notebooks		notebooks
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
conda-guidelines.md		conda-guidelines.md
config.yml		config.yml
pip-guidelines.md		pip-guidelines.md

License

codenamewei/pydata-science-playground

Folders and files

Latest commit

History

Repository files navigation