🔥🔥🔥
🔥🔥🔥
THIS REPOSITORY IS NOW ARCHIVED AND CONTINUE UPDATING AT
https://github.com/whitepawglobal/bite-size-python
🔥🔥🔥
🔥🔥🔥
Create environment (Only for the first time)
git clone https://github.com/codenamewei/pydata-science-playground.git
cd <path-to>/pydata-science-playground
conda env create -f config.yml
Activate environment
conda activate pyplayground
Install package with pip
pip install <package-name>
. Example:pip install numpy
- For more pip commands, check out pip guidelines document
Install package with conda
conda install <package>
. Example: conda install numpy
- For more conda commands, check out conda guidelines document
- Single Line Comment:
//sample text
- Multi Lines Comment:
""" Hello World! Nice to meet all of you cookie monsters! """
- Define Nan, Infinite
- Sum up an array:
sum(arr)
- Round up a number to a certain decimal point:
round(value, 1)
- Calculate percentile
- Power of a number:
pow(base_number, exponent_number
- Square root of a number:
sqrt(number)
- Logarithm / Log
- Log to the base of 2:
- Numpy:
import numpy as np; np.log2(10)
- Math:
import math; math.log2(10)
- Plotting of log to the to the base of 2
- Numpy:
- Log to the base of 2:
- Format floating value to n decimal:
"%.2f" % floating_var
Notes:
Difference between bytes() and bytearray() is that bytes() returns an object that cannot be modified (immutable),
and bytearray() returns an object that can be modified (mutable).
- Numpy <> Bytes, Bytes <> Numpy
- Bytes -> String:
bytesobj.decode("utf-8")
- String -> Bytes:
strobj.encode("utf-8")
- Bytes -> Multimedia file (video/audio))
- Check bytes encoding
- To Bytes:
bytes(<value>)
- Get size of bytes object:
import sys;sys.getsizeof(bytesobject)
- Split bytes to chunks
Notes:
Difference between bytes() and bytearray() is that bytes() returns an object that cannot be modified (immutable),
and bytearray() returns an object that can be modified (mutable).
- Integer to Bytearray
- Native Array to Bytearray
- Numpy Array to Bytearray
- [Image as Bytearray](notebooks/cv/image_as bytearray.ipynb)
- Check bytes array encoding
- To ByteArray:
bytearray(<value>)
- Numpy basic
- Get numpy shape:
nparray.shape
- Numpy array to list:
nparray.tolist()
- Change datatype:
nparray = nparray.astype(<dtype>)
Example:nparray = nparray.astype("uint8")
- Numpy NaN (Not A Number): Constant to act as a placeholder for any missing numerical values in the array:
np.NaN / np.nan / np.NAN
- Numpy multiply by a value:
nparray = nparray * 255
- Numpy array to image
- Numpy <> Binary File(.npy)
- Use of
numpy.where
- Generate string with parameter
- Using template literal:
print(f'Completed part {id}')
- Generate string with templates
- String formatting method:
print('Completed part {part_id}'.format(part_id))
- create string in the raw format: `varname="world"; print(f"Hello {varname!r}")
- Using template literal:
- Check if string is empty, len = 0:
if not strvar:
- Check if string contains digit:
any(chr.isdigit() for chr in str1) #return True if there's digit
- Check file extension: notebooks/string/check_file_extension.ipynb
- Capitalize a string:
strvar.capitalize()
- Uppercase a string:
strvar.upper()
- Lowercase a string:
strvar.lower()
- Get substring from a string:
strvar[<begin-index>:<end-index>]
/strvar[<begin-index>:]
/strvar[:<end-index>]
- Remove white spaces in the beginning and end:
strvar.strip()
- Swap existing upper and lower case:
strvar.swapcase()
- Capitalize every first letter of a word:
strvar.title()
- Splitting string:
- Split a string based on separator:
strvar.split(separator)
Example:strvar.split("x")
- Split on white space:
strvar.split()
- If split with every character, do this instead:
[*"ABCDE"]
Result:["A", "B", "C", "D", "E"]
- Split a string based on separator:
- Check if string starts with a substring:
strvar.startswith(<substring>)
- Check if string ends with a substring:
strvar.endswith(<substring>)
- Check if string have substring/specific character. Returns -1 if not found:
strvar.find(<substring>)
- String get substring with index:
str[startindex:endindex]
- Replace string/character with intended string/character:
strout = strin.replace(" ", "_")
- Replace multiple string/characters with intended string/character
- Generate random string
- List to string:
<separators>.join(list) example: ', '.join(listbuffer)
- Generate unique identifer UUID
- Validate if a string is UUID
- Compare if both UUID are the same
- UUID to string:
str(uuidparam)
- List of str to int:
list(map(int, arr))
- List with range of values:
list(range(...))
- Split str to list of str:
arr.split(" ")
- Check for empty list:
if not mylist:
- Find if a value in a list:
if value in mylist:
/if value not in mylist:
- Sort an array in place:
arr.sort()
/ Return a sorted array:sorted(arr)
- Get index of a value:
arr.index(value)
- Add one more value to existing list:
arr.append(value)
- Extend list with values in another list:
arr.extend(arr2)
- Remove an item from the list:
arr.remove(item)
- Check for empty list:
arr = []; if not arr: #empty list
- Check all items in a list(subset) if exist in another list, returns boolean:
set(b).issubset(v)
- Build list of same values:
['100'] * 20 # 20 items of the value '100'
- Change values of list with List Comprehension:
[func(a) for a in sample_list]
- Iteration of list with index:
for index, value in enumerate(inlist):
- Iteration over two lists: `[ for item1, item2 in zip(list1, list2)]```
- Count occurence of items in list
- Get maximum value in a list of numbers (even strings):
max(samplelist)
- Define dict with str keys
- Add new key value pair:
dict.update({"key2":"value2"})
- Remove key<> value pair by referring to specific key
- Get keys as list:
list(lut.keys())
- Get values as list:
list(lut.values())
- Create dict from list:
{i: 0 for i in arr}
- Handling missing items in dict
- Iteration to dict to get keys and values
- Save/load dictionary to/from a file
- Revert or inverse a dictionary mapping:
inv_map = {v: k for k, v in my_map.items()}
- Copy by value:
sampledict.copy()
- To identify if any items in the iterables has True/1 values:
any(sample_list) #returns single value True/False
- Zip multiple iterables
- Dataframe basic
- Get # rows and columns
- Get summary/infos about dataframe
- Get data types
- Dataframe/Series Min, Max, Median, General Description
- Get rows name (index) and columns name (column)
- Get a glimpse of dataframe
- Get subset of a dataframe by rows/by columns
- Get rows by finding matching values from a specific column
- Check if a column name exist in dataframe -
if 'code' in df.columns:
- Iteration of each rows in a dataframe
- Check if dataframe is empty:
df.empty #return boolean
- Get dataframe from list
- Build dataframe with columns name
column_list = ["a", "b"] df = pd.DataFrame(columns = column_list)
- Build a new dataframe from a subset of columns from another dataframe
- Get subset of dataframe, sample columns with specific criteria
- Sample by percentage
- Sample by # of rows specified
- Sample by matching to a value
- Column to list:
df.columns.tolist()
- Sample rows:
df = df.sample(frac=1).reset_index(drop=True)
- Referring to dataframe column by key or by string
- Concatenate dataframe
- Concatenate by adding rows
- Append string to all rows of a column
- Reset index without creating new (index) column:
df.reset_index(drop=True)
- Assign df by copy instead of reference -
df.copy()
- Shuffle rows of df:
df = df.sample(frac=1).reset_index(drop=True)
- Pandas with multiple index
- Series to value
- Series/Dataframe to numpy array:
input.to_numpy()
- Series iteration:
for index, item in seriesf.items():
- Series to dict:
seriesf.to_dict()
- Create new column and assign value according to another column
- Assign values by lambda and df.assign
- Dataframe append rows
- Drop duplicates for df / subset, keep one copy and remove all
- Remove/drop rows where specific column matched value
- Remove specific columns with column name
- Drop rows by index
- Drop rows/columns with np.NaN:
df3 = df3.dropna(axis = 1) #row
- pivot table:
:TODO
- Drawback: Not able to do filtering selection
- Merge two dataframes based on certain column values
- Filter with function isin()
- Filter df with item not in list
- Filter with function query()
- Find with loc
df.loc[df['address'].eq('johndoe@gmail.com')] #filter with one value
df.loc[df.a.eq(123) & df.b.eq("helloworld")] #filter with one value in multiple columns
df.loc[df.a.isin(valuelist)] #filter with a few values in a list
- Assign value to specific column(s) by matching value
- Get a subset of dataframe by rows -
df.iloc[<from_rows>:<to_rows>, :]
- Count items and filter by counter values
- Retrieve columns name which match specific str
- Read in excel with specific sheet name:
pd.read_excel(<url>, sheet_name = "Sheet1", engine = "openpyxl")
- Note: Install engine by
pip install openpyxl
- Note: Install engine by
- Read number of sheets in excel
- Save excel:
df.to_excel('file_name', index = False)
- Write to multiple sheets
- Read csv with other delimiter
pd.read_csv(<path-to-file>, delimiter = '\x01')
- Read csv with bad lines
pd.read_csv(<path-to-file>, on_bad_lines='skip')
- Note:
pd.read_csv(<path>, error_bad_lines = False)
deprecated
- Note:
- Read csv with encoding
pd.read_csv('file name', encoding = 'utf-8')
- Save to csv
df.to_csv('file name', index = False)
- Note: Put
index = False
is important to prevent an extra column of index being saved.
- Note: Put
- Save to csv with encoding
df.to_csv('file name', encoding = 'utf-8')
Panda Parquet In/Out
- Read in parquet:
pd.read_parquet(...)
- Write to parquet:
pd.to_parquet(...)
Note: Pickle have security risk and slow in serialization (even to csv and json). Dont use
- Read in pickle to dataframe:
df = pd.read_pickle(<file_name>) # ends with .pkl
- Save to pickle:
df.to_pickle(<file_name>)
- Generate random integer within (min, max):
from random import randint; randint(0, 100) #within 0 and 100
- Generate random floating value:
from random import random; random()
- Randomly choosing an item out from a list:
import random; random.choice([123, 456, 378])
- Generate list with random number:
import random; random.sample(range(10, 30), 5)
- Example shown where 5 random numbers are generated in between 10 to 30
- ValueError: argument of the correct data type but an inappropriate value
- TypeError: the data type of an object is incorrect
- IndexError: Raised when a sequence subscript is out of range
- KeyError: When key cannot be found
- ZeroDivisionError: when a number is divided by zero
- OSError: error from an os-specific function
- FileNotFoundError: when a file or directory is requested but doesn’t exist
- NotImplementedError: commonly raised when an abstract method is not implemented in a derived class
- NameError: reference to some name (variable, function, class) that hasn’t been defined
- AttributeError: reference to certain attribute in a class which does not exist
- ImportError: Trouble loading a module
- Submodule
- ModuleNotFoundError: the module trying to import can’t be found or try to import something from a module that doesn’t exist in the module
- Submodule
-
The character used by the operating system to separate pathname components:
os.sep
-
Iterate through a path to get files/folders of all the subpaths
-
Write file:
f.write(str)
-
print without new line:
print(..., end="")
-
Get environment path (second param is optional):
import os; os.getenv(<PATH_NAME> : str, <alternative-return-value>: str)
-
Check if path is a folder:
os.path.isdir(<path>)
-
from pathlib import Path; outsize : int = Path(inputfilepath).stat().st_size
import os; outsize : int = os.path.getsize(inputfilepath)
-
Create folder:
os.mkdir(<path>
-
Create folders recursively:
os.makedirs(<path>)
-
Get folder path out of given path with filename:
os.path.dirname(<path-to-file>)
-
Expand home directory:
os.path.expanduser('~')
-
Get current running script path:
os.getcwd()
-
Get the list of all files and directories in the specified directory (does not expand to items in the child folder:
os.listdir(<path>)
-
Get current file path (getcwd will point to the running script(main) path, this will get individually py path):
os.path.dirname(os.path.abspath(__file__))
-
Get filename from path:
os.path.basename(configfilepath)
-
Split extension from rest of path(Including .):
filename, ext = os.path.splitext(path)
-
Append certain path:
sys.path.append(<path>)
-
Check if path exist:
os.path.exists(<path>)
-
Remove a file:
os.remove()
-
Get size of current file in byte:
os.path.getsize(<path>)
orfrom pathlib import Path; Path(<path>).stat().st_size
-
Removes an empty directory:
os.rmdir()
-
Deletes a directory and all its contents:
shutil.rmtree()
-
open(<path-to-file>, mode)
- `r`: Open for text file for reading text - `w`: Open a text file for writing text - `a`: Open a text file for appending text - [`b`: Open to read/write as bytes](notebooks/cv/image_as_byte.ipynb) Read file has 3 functionsread()
orread(size)
: read all / size as one string.readline()
: read a single line from a text file and return the line as a string.readlines()
: read all the lines of the text file into a list of strings.write(<param> : str)
: write in param. Need to explicitly add\n
to split line..close()
: close file iterator
- Get system input
- Check operating system:
import platform; platform.system()
- Check if port is open/close
- Measure time prior and after
- Add delay to execution of the program by pausing:
import time;time.sleep(seconds)
- Note: stops the execution of current thread only
- Effective way to view object address and object
- Reserved methods in class
- The magic variable *args and **kwargs
- Check if object is of specified type:
isinstance(obj, MyClass)
/isinstance(obj, (type1, type2) : tuple)
- Deep Copy, Shallow Copy
- Copy list by value:
list_cp = list_ori[:]
(Note:list_cp = list_ori
copy by reference)
- Copy list by value:
- Define dataclass
- dataclass 1
- dataclass 2
- Magic methods
__repr__
and__dict__
are created when define class with dataclass
- Magic methods
- Implement Enum in Python
- Serialize class object
__dict__
return all attributes of an object(only those defined in init):obj.__dict__
__str__
return string representation of the obj:def __str__(self):
__eq__
compare the instances of the class:def __eq__(self, other):
__repr__
: represent a class's objects as a string. Call object withrepr(obj)
- Find matching word/character 1
- Introduction of functions in re library
- Square brackets for upper and lower case
[Ww]oodchuck
- Find matching word/character 2
- Optional character with
?
- Optional 0 or more character with
*
- Optional 1 or more character with
+
- Any character with
.
- Optional character with
- Find matching word/character 3
- Whitespace character find with
\s
- Non-whitespace character find with
\S
- Whitespace character find with
- Find matching word/character 4
- Caret before square bracket:
^[]
to indicate beginning - Dollar sign after square bracket:
[]$
to indicate ending
- Caret before square bracket:
- Negation
- Disjunction
- To match a series of patterns with parenthesis.
- Extract hashtags
- Extract numbers from string
- Produce a new iterable with map()
- Generate a new iterable with Boolean-return function with filter()
- Produce a single cumulative value from iterable with reduce()
- Condition checking with any()
- Multiple function declaration with singledispatch)
Note: Functional style can be replaced with list comprehension or generator expressions
- from abs import ABC
- from abs import ABCMeta
- Difference between importance ABC or ABCMeta
- TLDR: ABC is a wrapper of ABCMeta, both serves the purpose where former easy to write.
- Unnamed arguments
- Named arguments:
:TODO
- Filename as argument
- Basic:
import logging logger = logging.getLogger(__name__) logging.basicConfig(stream=sys.stdout, level=logging.INFO)
- Advanced configuration log to stdout
- Advanced configuration log to file
- Log with variables:
logging.error(f"Keys {a} is missing")
- Log exception
- Class Method
- Static Method
- dataclass
- Abstract class with ABCMeta and @abstractmethod
- Property Setting
- @property to prevent setting value
- Native Verbose Method
- Using built-in property function
- Using decorator
- getter: @property
- setter: @{variable}.setter
- deleter: @{variable}.deleter
- Module typing: Type hint & annotations
- Module pydantic: Data parsing and validation library
:TODO
- Get IP from domain name:
import socket;socker.gethostbyname("www.google.com");
- Form Data
- Send image via UploadFile
- Client upload file to FastAPI Uploadfile and get response
- Return content from url and write image
- Postgres connect to AWS RDS
- Local Node
Save and load image between REST and PostgresObsolete: large files (including image) should be saved to storage
Save and load video between REST and PostgresObsolete: large files (including image) should be saved to storage
- List name of buckets
- List objects in a specific bucket
- Upload file with function upload_file or upload_fileobj
- Upload multipart
- Upload multipart with multiple workers
- Get object from S3
Note:
-
What is a bucket in S3
A bucket is a container for objects stored in Amazon S3 which can contains files and folders. You can store any number of objects in a bucket and can have up to 100 buckets in your account
- Check if cuda is available -
import torch; torch.cuda.is_available()
- Softmax
Torch Tensor Creation
- Create tensor of zeros with shape like another tensor:
torch.zeros_like(another_tensor)
- Create tensor of zeros with shape (tuple):
torch.zeros(shape_in_tuple)
- Create tensor of ones with shape like another tensor:
torch.ones_like(another_tensor)
- Create tensor of ones with shape (tuple):
torch.ones(shape_in_tuple)
- Create tensor of random floating value between 0-1 with shape like another tensor:
torch.rand_like(another_tensor, dtype = torch.float)
- Create tensor of random floating value between 0-1 with shape (tuple):
torch.rand(shape_in_tuple)
Torch Tensor Info Extraction
- Given torch.tensor
buffer = tensor(4)
, get the value by -id = buffer.item()
- Given torch.tensor, get the argmax of each row -
torch.argmax(buffer, dim=<(int)dimension_to_reduce>)
- Tensor to cuda -
inputs = inputs.to("cuda")
- Tensor shape -
tensor.shape
- Tensor data types -
tensor.dtype
- Device tensor is stored on -
tensor.device
- Torch tensor(single value) to value:
tensorarray.item()
- Retrieve subset of torch tensor by row index:
tensor[<row_number>, :]
/tensor[<row_number_from>:<row_number_to>, :]
- Retrieve subset of torch tensor by column index:
tensor[:, <column_number_from>:<column_number_to>]
Torch Tensor Conversion
- List to torch tensor -
torch.tensor(listimp)
- Numpy array to torch tensor -
torch.from_numpy(np_array)
- Image to torch tensor
- Torch tensor to image
Torch Tensor Operation
- Torch tensor value change by indexing and conditions
- Concatenate tensor according to dimension (0 for adding rows, 1 for adding columns):
torch.cat([<tensor_1>, <tensor_2>, ...], dim = <dimension_number>
Dataset Loader, Iterator
torch.utils.data.DataLoader
: stores the samples and their corresponding labels,torch.utils.data.Dataset
: wraps an iterable around the Dataset to enable easy access to the samples
Torch Tensor In/Out
- Save torch tensor to file:
torch.save(x : torch.tensor, tensorfile :str)
- Load torch tensor from file:
torch.load(tensorfile :str)
-
-
Fashion MNIST Torch
Fashion-MNIST is a dataset of Zalando’s article images consisting of 60,000 training examples and 10,000 test examples. Each example comprises a 28×28 grayscale image and an associated label from one of 10 classes.
-
- Send model to cuda -
model.to('cuda:0')
ormodel.cuda()
- Overview of DatasetDict
- DatasetDict from Pandas Dataframe
- Get image shape:
img.shape
- Create a color image:
image = np.zeros((h,w,3), np.uint8)
- Read/Write image:
- Read image from url
- Pause to display image or wait for an input:
cv2.waitKey(0)
- Save an image:
cv2.imwrite(pathtoimg : str, img : numpy.ndarray)
- Show an image in window:
cv2.imshow(windowname : str, frame : np.array)
- Show an image in Jupyter notebok
from IPython.display import Image Image(filename=pathtoimg : str)
- Flip image:
frame = cv2.flip(frame, flipcode : int)
- Positive flip code for flip on y axis (left right flip)
- 0 for flip on x axis (up down)
- Negative for flipping around both axes
Computer Vision - Filter
- Blur with averaging mask:
cv2.blur(img,(5,5))
- GaussianBlur:
blur = cv2.GaussianBlur(img,(5,5),0)
- Note: Kernel size
(5, 5)
to be positive and odd. Read more here on how kernel size influence the degree of blurring.
- Note: Kernel size
- Blurring region of image
Computer Vision - Video Stream
- Concat multiple video streams to show side by side: 2 video streams 3 video streams
- Save stream to video output
- Read in video stream from a file
- Read in stream from camera
- video arrays (in opencv) -> bytes -> np.array -> video arrays (in opencv)
- Merge audio with video
- Check if video comes with audio
- Split audio from video
Computer Vision - Other
- Overlay image
- Resizing frame:
outframe = cv2.resize(frame, (w, h))
- Set color to rectangle region
- Color to gray image:
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
- Remove background
- Add channel to image