# **Level-up my learning**

### **1. Setup python env variables in zsh** 

**step1:Find python**\
Find the location where python is installed.
In my case it is installed in:
>/opt/homebrew/bin/python3

**step2:Add python in file**\
Now add this path in the `'/etc/paths'` file.
Open 'paths' file using:
>sudo nano /etc/paths
Provide your password and add the above line in the file the save using `ctrl+X` then press `Y` and press Enter to save file. Now you can simply type python3 to run python.



### **2. Adding python3 alias to python**

**step1:add `alias python='python3'`**\
To add python alias find the `.zshrc` hidden file in `/Users/arsalanamin`.Using:
>sudo nano /users/arsalanamin/.zshrc

Provide your password and add `alias python='python3'` in the file the save using `ctrl+X` then press `Y` and press Enter to save file.


### **3. Regular expressions**

**Meta Characters**

|character|Desc|
|--|--|
|[]| A set of characters|
|\ |Signals a special sequence (can also be used to escape special characters)|
|.|Any character (except newline character)|
|^|Starts with|
|$|Ends with|
|*| 0 or more occurrences|
|+| 1 or more occurrences|
|? | 0 or 1 occurences|
|{}|Exactly the specified number of occurrences|
| \| | Either or |
|() |Capture and group|


**Special Sequences**

|character|Desc|
|--|--|
|\A|Returns a match if the specified characters are at the beginning of the string|
|\d|Returns a match where the string contains digits (numbers from 0-9)|
|\b|Returns a match where the specified characters are at the beginning or at the end of a word r” ain\b.”|
|\s|Returns a match where the string contains a white space character|
|\w|Returns a match where the string contains any word characters (characters from a to Z, digits from 0-9, and the underscore _ character)|
|\Z|Returns a match if the specified characters are at the end of the string|

e.g:
```python
- findall,
- search, 
- split, 
- sub, 
- finditer, 
- sub
```

#### **Example**

In [179]:
import re

s = "abc123def456ghi789"
# Find all occurrences of the pattern \d+
re.findall(r'\d+', s)

['123', '456', '789']

In [180]:
tweet = 'Im goin to write a #hastag element'

# substract '#' from the tweet
re.sub(r'#','', tweet)

'Im goin to write a hastag element'

#### **More examples**

In [181]:
txt = "#bistiProof FollowFriday @France @Fastly @Fasting @Lasting @Roasting @PKuchly57 @Milipol_Paris for\
 being top rope rain roun engaged raged vacinity members in my community in nearity :( and :() this week :)' \
 https://www.youtube.com/watch?v=g8u0wLvvPSs fax number 20712-1234 itit itit\
 +92-234234234 +92-1231415  +92-12311515 +92-12837817"
 
re.findall(r'.ty',txt)

['ity', 'ity', 'ity']

In [182]:
# any chracter followed by 'ty'
re.findall(r'.ty',txt)

['ity', 'ity', 'ity']

In [183]:
# 'i' then no/zero or any number of occurences of 't'
re.findall(r'it*',txt)

['i',
 'i',
 'i',
 'i',
 'i',
 'i',
 'i',
 'i',
 'i',
 'i',
 'i',
 'i',
 'it',
 'i',
 'it',
 'i',
 'it',
 'i',
 'it',
 'it',
 'it',
 'it']

In [184]:
# 'i' then at least 1 or more occurences of t 
re.findall(r'it+',txt)

['it', 'it', 'it', 'it', 'it', 'it', 'it']

In [185]:
# gives 'm' with zero or more occurences of m and 'ged' match
re.findall(r'mm*|ged',txt)

['ged', 'ged', 'm', 'm', 'm', 'mm', 'm', 'm']

In [186]:
# starts with 'com'
re.findall(r'\bcom',txt)

['com', 'com']

In [187]:
# ends with 'ty'
re.findall(r'ty\b',txt)

['ty', 'ty', 'ty']

In [188]:
# Returns a match where the string contains any word characters (characters from a to Z, digits from 0-9, and the underscore _ character)
re.findall(r'\wop',txt)

['top', 'rop']

In [189]:
# Returns a match where the string contains any word characters 
# (characters from a to Z, digits from 0-9, and the underscore _ character)
re.findall(r'F\w',txt)

['Fo', 'Fr', 'Fr', 'Fa', 'Fa']

In [190]:
re.findall(r'f\s*',txt)

['f ', 'f', 'f']

In [191]:
# any word character before 'ged'

re.findall(r'\w.ged',txt)

['gaged', 'raged']

In [192]:
# zero or more occurence of anything(.) after 'num'
re.findall(r'num.*',txt)

['number 20712-1234 itit itit +92-234234234 +92-1231415  +92-12311515 +92-12837817']

In [193]:
# remove hyperlinks
# https?://   # match 'http' or 'https' followed by '://'
# .*          # match any number of characters
# [\r\n]*     # match a newline character (carriage return or line feed)
tweet = 'joim me while I feed the troops. :) https://t.co/ZlcsRuUpPY in the mess'
re.sub(r'https?:\/\/.*[\r\n]*', '', tweet)

'joim me while I feed the troops. :) '

In [194]:
# remove old style retweet text "RT"
tweet = 'RT joim me while I feed the troops. :) https://t.co/ZlcsRuUpPY via @audioBoom'
re.sub(r'^RT[\s]+', '', tweet)

'joim me while I feed the troops. :) https://t.co/ZlcsRuUpPY via @audioBoom'

In [195]:
# remove stock market tickers like $GE
# backslash is required for some symbols to be in raw string
tweet = 'RT joim me while I feed  $GE the troops. :) https://t.co/ZlcsRuUpPY via @audioBoom'
re.sub(r'\$\w*', '', tweet)

'RT joim me while I feed   the troops. :) https://t.co/ZlcsRuUpPY via @audioBoom'

### **3. One Hot Encoding**

In [196]:
import pandas as pd
import numpy as np
from sklearn.preprocessing import OneHotEncoder


In [197]:
dic1= {'col1':[1,2,3,4,4], 'col2':['a','a','b','c','d']}
df = pd.DataFrame(dic1)
df

Unnamed: 0,col1,col2
0,1,a
1,2,a
2,3,b
3,4,c
4,4,d


In [198]:
categorical = df['col2'].to_frame()
categorical

Unnamed: 0,col2
0,a
1,a
2,b
3,c
4,d


In [199]:
enc = OneHotEncoder()
X_ohe = enc.fit_transform(categorical)
X_ohe.toarray()

array([[1., 0., 0., 0.],
       [1., 0., 0., 0.],
       [0., 1., 0., 0.],
       [0., 0., 1., 0.],
       [0., 0., 0., 1.]])

In [200]:
categorical.columns

Index(['col2'], dtype='object')

In [201]:
enc.get_feature_names_out(categorical.columns)

array(['col2_a', 'col2_b', 'col2_c', 'col2_d'], dtype=object)

In [202]:
pd.DataFrame(X_ohe.toarray(), columns=enc.get_feature_names_out(categorical.columns))

Unnamed: 0,col2_a,col2_b,col2_c,col2_d
0,1.0,0.0,0.0,0.0
1,1.0,0.0,0.0,0.0
2,0.0,1.0,0.0,0.0
3,0.0,0.0,1.0,0.0
4,0.0,0.0,0.0,1.0


### **4. Using funtools reduce()**
python code to demonstrate working of `reduce()`

In [203]:
import functools

lis = [1, 3, 5, 6, 2]

def convert(var1, var2):
    return var1+var2

# the function here takes 2 arguments
print(f"The sum of the list elements is :{functools.reduce(convert, lis)}")
# OR
print(f"The sum of the list elements is :{functools.reduce(lambda a,b:a+b, lis)}")

The sum of the list elements is :17
The sum of the list elements is :17


In [204]:
import pandas as pd

# concat one dataframe to other and get a single big dataframe at end
df1 = pd.DataFrame({'a':[1], 'b':[112]})
df2 = pd.DataFrame({'a':[2], 'b':[2312]})
df3 = pd.DataFrame({'a':[12], 'b':[22]})

all_data = [df1,df2,df3]

functools.reduce(lambda a,b:pd.concat([a,b],axis=0), all_data)

Unnamed: 0,a,b
0,1,112
0,2,2312
0,12,22


### **5. Using filter() function**

In [205]:
import re

mylist = [1,2,3,20,30,40]

newlist = list(filter(lambda x: x if x>10 else None, mylist)) # Read Note below
print(newlist)

[20, 30, 40]
