
[Installing Python packages in Jupyter Notebooks](https://github.com/microsoft/vscode-jupyter/wiki/Installing-Python-packages-in-Jupyter-Notebooks)
<h4><code>!pip install</code> vs <code>%pip install</code></h4>
<p>Any command prefixed with <code>!</code> is treated as a shell command in Jupyter cells. Thus <code>!pip install &lt;module&gt;</code> is treated as a simple shell command that translates to <code>pip install &lt;module&gt;</code>. However the recommendation is to use <code>python -m pip install &lt;module&gt;</code><h4>  

To get this desired behavior one must use  
## <code>%pip install &lt;module&gt;</code></p>
<h4><code>%conda install</code><h4>
<p>However when installing packages in Jupyter into a conda environment, use of <code>conda install</code> is preferred over <code>pip install</code>. Hence its highly recommended that one use <code>%conda install</code> in jupyter notebooks when dealing with Conda enviornments.</p>

In [None]:
# %pip install pandas
# %conda install in jupyter notebooks when dealing with Conda enviornments.

: 

In [None]:
import pandas as pd
import numpy as np

: 

In [7]:
# Замена значений столбца на предлагаемые в массиве 
# Заменяем пол и курение на числа
df = pd.DataFrame(np.array([['Ann','female','yes'],
                ['Boris','male','no'],
                ['Kate','female','yes']]),
                    columns= ['Name', 'sex', 'smoker'])
df

Unnamed: 0,Name,sex,smoker
0,Ann,female,yes
1,Boris,male,no
2,Kate,female,yes


In [11]:
# get_dummies   Convert categorical variable into dummy/indicator variables.

df = pd.get_dummies(df, columns=['sex'])
df

Unnamed: 0,Name,smoker,sex_female,sex_male
0,Ann,yes,1,0
1,Boris,no,0,1
2,Kate,yes,1,0


In [12]:
# df['sex2']=df['sex'].map({'male':1, 'female':0})
df['smoker2']=df['smoker'].map({'yes':1,'no':0})
df

Unnamed: 0,Name,smoker,sex_female,sex_male,smoker2
0,Ann,yes,1,0,1
1,Boris,no,0,1,0
2,Kate,yes,1,0,1


In [13]:
import pandas as pd

# Create a sample dataframe with a categorical column 'color'
df = pd.DataFrame({'color': ['red', 'green', 'blue', 'red']})

# Use get_dummies to convert the 'color' column into one-hot encoded columns
df_encoded = pd.get_dummies(df['color'])
df_encoded

Unnamed: 0,blue,green,red
0,0,0,1
1,0,1,0
2,1,0,0
3,0,0,1


In [14]:

# Concatenate the original dataframe with the one-hot encoded columns
df_concat = pd.concat([df, df_encoded], axis=1)

print(df_concat)


   color  blue  green  red
0    red     0      0    1
1  green     0      1    0
2   blue     1      0    0
3    red     0      0    1


### `LATEX` in Markdown

$$
y=a+bx+\epsilon \\
\epsilon \sim \mathcal{N}(0,\,\sigma^{2})
$$

it means that

$$
p(\epsilon)=\frac{1}{\sqrt{2 \pi \sigma^2}} e^{-\frac{\epsilon^2}{2\sigma^2}}.
$$


In [116]:
%%bash
echo "hello from $BASH"

hello from /bin/bash


In [16]:

%%writefile foo.py
print('Hello world')

Overwriting foo.py


In [17]:
%run foo

Hello world


In [18]:
#Finding all subsets (of given length) of a set in one line
from itertools import combinations, permutations
print(list(combinations([1, 2, 3, 4], 2) ), end='')
print('\n\n')
print(list(permutations([1, 2, 3, 4], 3) ) )

[(1, 2), (1, 3), (1, 4), (2, 3), (2, 4), (3, 4)]


[(1, 2, 3), (1, 2, 4), (1, 3, 2), (1, 3, 4), (1, 4, 2), (1, 4, 3), (2, 1, 3), (2, 1, 4), (2, 3, 1), (2, 3, 4), (2, 4, 1), (2, 4, 3), (3, 1, 2), (3, 1, 4), (3, 2, 1), (3, 2, 4), (3, 4, 1), (3, 4, 2), (4, 1, 2), (4, 1, 3), (4, 2, 1), (4, 2, 3), (4, 3, 1), (4, 3, 2)]


In [19]:
!python --version

Python 3.10.8


In [20]:
# dir(__builtin__)

In [21]:
# ? print

In [22]:
'''
range() сам по себе это отдельный тип данных, он не итератор и не генератор. Нельзя применить next() просто так к range(), 
необходимо сначала преобразовать к итератору с помощью функции iter() и уже затем мы можем пройтись по последовательность из range(). 
'''

r = range(0,10,1)
i = iter (r)
print(next(i))
print(next(i))
print(next(i))

print (r, '|', i)

0
1
2
range(0, 10) | <range_iterator object at 0x7f7e402f2a90>


In [23]:
len(r)

10

In [24]:
l = list(r)
print(l,type(l), sep=' | ')

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9] | <class 'list'>


In [25]:
for x in range(5):
  print(x)

0
1
2
3
4


In [26]:
l = list ((range(10)))
l

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

In [27]:
# переворачиваем список
#%%timeit
l3=l[::-1]
l3

[9, 8, 7, 6, 5, 4, 3, 2, 1, 0]

In [28]:
l3=list(reversed(l))
l3

[9, 8, 7, 6, 5, 4, 3, 2, 1, 0]

In [29]:
#  slice as an object
slice_3_4 = slice(2,4)
a=l3[slice_3_4]
b=l[slice_3_4]
print(slice_3_4, type(slice_3_4), a, b, sep=' | ')

slice(2, 4, None) | <class 'slice'> | [7, 6] | [2, 3]


In [30]:
array = ['aa','bbbbb', 'c', 'ddd']
string = '_'.join(array)
string

'aa_bbbbb_c_ddd'

In [31]:
l = list (array) 
l.append("Bang!")
print (l , sorted(l, reverse = True), sep=' - ')

['aa', 'bbbbb', 'c', 'ddd', 'Bang!'] - ['ddd', 'c', 'bbbbb', 'aa', 'Bang!']


In [32]:
l.pop(-1) # delete list item with index (-1) and return it 

'Bang!'

In [33]:
l

['aa', 'bbbbb', 'c', 'ddd']

In [34]:
len(l)

4

In [35]:
# sort on string length

print(sorted(l, key = lambda x: len(x), reverse = False))
print(sorted(l, key = lambda x: len(x), reverse = True))

['c', 'aa', 'ddd', 'bbbbb']
['bbbbb', 'ddd', 'aa', 'c']


#    sorted - не изменяет объект
#    sort   - изменяет

In [36]:
def comparator(string):
  return len(string)

In [37]:
comparator(l)

4

In [38]:
sorted(l, key = comparator)

['c', 'aa', 'ddd', 'bbbbb']

In [39]:
l.sort() # sort  -change the object
l

['aa', 'bbbbb', 'c', 'ddd']

In [40]:
items =      [1,3,2,7,33,4,5,9,3,17,999]
shuffled_items = [9,7,4,999,5,3,1]
print(items)
print(shuffled_items)

[1, 3, 2, 7, 33, 4, 5, 9, 3, 17, 999]
[9, 7, 4, 999, 5, 3, 1]


In [41]:
#         сортируем  по          индексу в items
# get shuffled_items entry and seek it's index in items. I no -> error
shuffled_items.sort(key= lambda x: items.index(x))
print(items)
print(shuffled_items)

[1, 3, 2, 7, 33, 4, 5, 9, 3, 17, 999]
[1, 3, 7, 4, 5, 9, 999]


In [42]:
methods = dir(__builtins__)  # сканируем __builtins__
print (type(methods), '\n')
#The enumerate object yields pairs containing a count (from start, which
#defaults to zero) and a value yielded by the iterable argument.

for num, method  in enumerate(methods):  # выводим пронумерованный список methods
  if num % 25 == 1:
    print(num, method)

<class 'list'> 

1 AssertionError
26 ImportError
76 __import__
101 dir
126 license
151 str


In [43]:
for num, method  in enumerate(methods):  # выводим пронумерованный список methods
  if num % 25 == 1:
    print(num, method)

1 AssertionError
26 ImportError
76 __import__
101 dir
126 license
151 str


In [44]:
enum = list(enumerate (methods)) #  список из пар  num, method
zip_enum = list(zip (range(0,len(methods)),
                    methods))  # zip  склеивает  номер по порядку и methods
print( enum == zip_enum)

True


In [45]:
enum = list(enumerate (methods, start = 0)) #  список из пар  index, value (method)
#  zip склеивает  две последовательности (range и methods). Until the shortest len
zip_enum = list(zip(range(0,len(methods)) , methods ))  
print( enum == zip_enum)

True


In [46]:
first = 'a b c d e f g'.split(' ')

zip_style = zip(range(0, len(first)), first)

print(list(zip_style))

# Отличие итераторов (функции enumerate и zip создают объекты итераторы) от листа в том, что итераторы поддерживают лишь единственную итерацию по себе,
# по ее истечении они оказываются полностью перечисленными 

print(list(zip_style))  # здесь уже итератор выполнен

[(0, 'a'), (1, 'b'), (2, 'c'), (3, 'd'), (4, 'e'), (5, 'f'), (6, 'g')]
[]


In [47]:
? enumerate

[0;31mInit signature:[0m  [0menumerate[0m[0;34m([0m[0miterable[0m[0;34m,[0m [0mstart[0m[0;34m=[0m[0;36m0[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m     
Return an enumerate object.

  iterable
    an object supporting iteration

The enumerate object yields pairs containing a count (from start, which
defaults to zero) and a value yielded by the iterable argument.

enumerate is useful for obtaining an indexed list:
    (0, seq[0]), (1, seq[1]), (2, seq[2]), ...
[0;31mType:[0m           type
[0;31mSubclasses:[0m     


In [48]:
from scipy.optimize.minpack import transpose
#вывести список из 100 чисел
", ".join(  [str(x) for x in range(100)]  )  # это не список, а строка

  from scipy.optimize.minpack import transpose


'0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99'

In [49]:
 a = [str(x) for x in range(100)]
 a[:9]

['0', '1', '2', '3', '4', '5', '6', '7', '8']

In [50]:
import this

The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!


In [51]:
t = 'abc', 12, 12.9
t2 = 12.,
print(t,type(t),'\n',t2,type(t2))
print(t2[0],  type(t2[0]))

('abc', 12, 12.9) <class 'tuple'> 
 (12.0,) <class 'tuple'>
12.0 <class 'float'>


In [67]:
l=['abg.asdas', 'dsfasd.sdf.sdf','sdf...asd','asdf.asdfa','sdfas']
# вывести строки , где > 2 '.'
print ([x for x in l if x.count('.') > 2 ])
# вывести  список строк и кол-во точек в них
list(zip([x.count('.') for x in l ], l))

['sdf...asd']


[(1, 'abg.asdas'),
 (2, 'dsfasd.sdf.sdf'),
 (3, 'sdf...asd'),
 (1, 'asdf.asdfa'),
 (0, 'sdfas')]

In [68]:
s1 = set(range(12))
s2 = set (range(7,20))
print(s1.difference(s2)) # Return the difference of two or more sets as a new set.
print(s2.difference(s1))
print(s1.intersection(s2)) # Return the intersection of two sets as a new set.
print(s1.union(s2)) # Return the union of sets as a new set.

{0, 1, 2, 3, 4, 5, 6}
{12, 13, 14, 15, 16, 17, 18, 19}
{7, 8, 9, 10, 11}
{0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19}


In [54]:
d = { a:a**2 for a in range(1,11)}
d

{1: 1, 2: 4, 3: 9, 4: 16, 5: 25, 6: 36, 7: 49, 8: 64, 9: 81, 10: 100}

In [69]:
text="Hello"
print(text[4:100])

o


In [56]:
file = open("out.txt", "w")

for i in range(5):

    file.write(str(i))

file.close()

file = open("out.txt", "w")

for i in range(5, 10):

    file.write(str(i))

file.close()

In [76]:
vec = [[1,2,3], [4,5,6], [7,8,9]]
[elem      for sublist in vec     for elem in sublist]

[1, 2, 3, 4, 5, 6, 7, 8, 9]

The outer loop `for sublist in vec` iterates over each element `sublist` in the list `vec`.

The inner loop `for elem in sublist` iterates over each element `elem` in the current `sublist`.

For each iteration of the inner loop, the expression `elem` is appended to the final list, which is constructed using the list comprehension syntax.

After both loops have completed, the final list comprehension will contain all elements from vec, flattened into a 1-dimensional list.

In [78]:
[elem for elem in vec]

[[1, 2, 3], [4, 5, 6], [7, 8, 9]]

In [79]:
vec = [[1,2,3], [4,5,6], [7,8,9]]

for elem in vec:
   for num in elem:
     print(num)

1
2
3
4
5
6
7
8
9


In [96]:
matrix = [
        [1, 2, 3, 4],
        [5, 6, 7, 8],
        [9, 10, 11, 12],
    ]

[[sublist[i] for sublist in matrix] for i in range(len(matrix[0]) )]

[[1, 5, 9], [2, 6, 10], [3, 7, 11], [4, 8, 12]]

In [98]:
for sublist in matrix:
  print (sublist )

[1, 2, 3, 4]
[5, 6, 7, 8]
[9, 10, 11, 12]


In [99]:
from functools import reduce
# вывод строки из списка c sep=', '
methods = dir(__builtin__)
# reduce() Apply a function of two arguments cumulatively to 
# the items of a sequence or iterable, from left to right, 
# so as to reduce the iterable to a single
''.join(    reduce(lambda x,y: x + ', ' + y , methods )   )      



In [62]:
from collections import defaultdict, Counter

# lets count letters in a list
counter = Counter()
for word in dir(__builtin__):
    for letter in word:
        counter[letter] += 1  # или .update(value)
    
print(counter)

Counter({'r': 229, 'e': 140, 'o': 138, 'n': 109, 't': 102, 'i': 95, 'a': 70, 's': 60, 'E': 58, 'c': 53, 'l': 48, '_': 47, 'p': 42, 'd': 41, 'm': 31, 'u': 30, 'g': 24, 'y': 23, 'b': 20, 'x': 17, 'I': 14, 'h': 12, 'W': 12, 'f': 11, 'v': 9, 'F': 8, 'N': 8, 'A': 7, 'P': 7, 'R': 7, 'S': 7, 'U': 7, 'k': 6, 'O': 6, 'D': 6, 'T': 6, 'B': 5, 'C': 5, 'L': 3, 'w': 3, '1': 3, 'K': 2, 'M': 2, '0': 2, 'z': 2, 'G': 1, 'V': 1, 'Z': 1, 'Y': 1, 'H': 1, '4': 1, '2': 1, 'j': 1})


In [106]:
arg ='__builtin__'
print(len(set(dir(arg))), len(set(dir(object))), len(set(dir(arg)) - set(dir(object))))
# print out distinct '__builtin__'  atribbutes
print(set(dir(arg)) - set(dir(object)))

80 23 57
{'swapcase', 'translate', '__getnewargs__', 'find', '__mul__', 'rpartition', 'format_map', 'strip', 'maketrans', 'rjust', 'lstrip', 'isprintable', '__getitem__', 'title', 'capitalize', '__rmul__', 'rfind', '__len__', 'encode', 'expandtabs', 'partition', 'endswith', '__add__', 'removeprefix', 'center', 'isspace', 'splitlines', 'isalnum', '__iter__', 'isnumeric', 'format', 'rindex', 'isascii', 'ljust', 'rsplit', 'casefold', 'removesuffix', 'join', 'lower', 'isupper', 'istitle', '__mod__', 'isdigit', '__contains__', 'isdecimal', 'rstrip', 'isalpha', 'zfill', 'islower', 'index', '__rmod__', 'count', 'isidentifier', 'upper', 'replace', 'startswith', 'split'}


# Search through ipynb files

In [110]:
!ls

ABD-22 LABa -Kuzmin Pandas.ipynb
ABD-22 LABa.ipynb
ABD-22 Total HW.ipynb
Constraint Programming.ipynb
DLS  simpsons_baseline.ipynb
DLS Python Base.ipynb
NETOLOGY PySpark Лекция 7.ipynb
Netlogy ERRORS_DATES .ipynb
Netlogy ERRORS_DATES HW.ipynb
Netology Numpy _HW.ipynb
Netology Pandas HW funcs group.ipynb
Netology Pandas HW_SQL.ipynb
Netology Pandas_JOIN.ipynb
Netology Pandas_JOIN_HW.ipynb
Netology Pandas_JOIN_SQL_HW .ipynb
Netology Pandas_PIVOT_and_PYMYSTEM.ipynb
Netology Pandas_basics.ipynb
Netology Pandas_functions_groupby.ipynb
Netology Pandas_lecture_04.ipynb
Netology Pandas_pivot_and_str.ipynb
Netology Pandas_presentation_mode.ipynb
Netology Pandas_vs_set_and_dict.ipynb
Netology Python  FUNCTIONS DICTS_sorting args  kwargs.ipynb
Netology Python  LAB.ipynb
Netology Python  netology Functions HW4 .ipynb
Netology Python CLASSES.ipynb
Netology Python Datatypes LOOPs.ipynb
Netology Python HW1 Основы.ipynb
Netology Python HW2.ipynb
Netology Python Scrapping Parsing HW.ipynb
Netology Pyth

### cmd

In [115]:
# !grep -o -r --include  \*.ipynb SQL ipynb
# search SQL in \*.ipynb
!grep -o -r --include  \*.ipynb SQL ipynb

# search from  notebook

In [114]:
import glob

# pattern = '../**/*.ipynb'
pattern = '*.ipynb'

# q = 'SQL'
q = 'try'

for filepath in glob.iglob(pattern, recursive=True):
    with open(filepath, encoding='UTF-8') as file:
        # print(file)
        s = file.read()
        if (s.find( q ) > -1):
            print(filepath)

Netlogy ERRORS_DATES HW.ipynb
Netology Pandas_JOIN_HW.ipynb
Netology stats_Case_study HW.ipynb
Netology stats_Practice_AB.ipynb
Netology Python list dict.ipynb
Netology math_Numpy Matrix.ipynb
Python SQL.ipynb
Netology_SCRAPING PARSING.ipynb
Netology stats_A_B-testing HW.ipynb
Netology Pandas_basics.ipynb
PYTHON MathPlotLib VIZ.ipynb
DLS Python Base.ipynb
Netology Pandas HW_SQL.ipynb
Netology stats Power Min_sample size_W.ipynb
Netlogy ERRORS_DATES .ipynb
Netology Python Scrapping Parsing HW.ipynb
Netology Python _Bases2.ipynb
Netology stats_case_study_Darwinbooks_recommend.ipynb
Netology math_LA_HW_3.ipynb
ABD-22 Total HW.ipynb
Netology Pandas_JOIN_SQL_HW .ipynb
Netology_REGEX_practice.ipynb
Python_Tricks.ipynb
DLS  simpsons_baseline.ipynb
Netology Pandas_functions_groupby.ipynb
Netology Python_7_ERRORS_DATETIME.ipynb
Netology stats Case_Power_MinSampleSize_HW.ipynb
Netology stats_t-Tests.ipynb
Netology stats Визуализация данных HW.ipynb
Netology stats_Confidence_Intervals.ipynb
NETOL

### целочисленное деление и остаток от деления 

In [None]:
a, b = -9, -5
a - a//b * b
print (a % b)
a % b == a - a//b * b

-4


True

In [None]:
2**3**2 == 2**(3**2) == 2**9 == 512

True

## python reserved *keywords*

In [None]:
help('keywords')


Here is a list of the Python keywords.  Enter any keyword to get more help.

False               class               from                or
None                continue            global              pass
True                def                 if                  raise
and                 del                 import              return
as                  elif                in                  try
assert              else                is                  while
async               except              lambda              with
await               finally             nonlocal            yield
break               for                 not                 



#### `Input a series of integers`

In [122]:
# ввод целого числа
d1, d2, d3, d4, d5 = map(int, input().split())
d1, d2, d3, d4, d5

(1, 5, 7, 4, 3)

##### `Steps to select only those dataframe rows, which contain any NaN value `
* **Step 1**: 
Call the `isnull()` function on dataframe like `df.isnull()`. It will return a same sized bool dataframe containing only `True or False` values. True value indicates that there is a NaN at the corresponding position in the original dataframe and False indicates a Non-NaN value.
* **Step 2**: Then call the `any(axis=1)` function on the bool dataframe like, `df.isnull().any(axis=1)`. The any() function looks for any True value along the given axis. If `axis==1`, then it will look along the columns for each row. It means, for each row it will check all the column values and reduce it to a single value. For a row, if any column contains the NaN, then the reduced value for that row will be True. Therefore, it returns a bool Series, where each value represents a row of dataframe. If value is True, then it indicates that there is one or more NaN values in that row.
* **Step 3**: Then pass this bool Series to `[]` operator of dataframe i.e. `df[df.isnull().any(axis=1)]`. It returns only those rows where bool Series has True value. It means it returns only those rows which contain any NaN value.

In [None]:
df[df.isna().any(axis=1)]

Unnamed: 0,State Postal,County Name,FIPS,Obama vote,%,Romney vote,%.1
1684,ME,Upton,23017,0,,0,


#### `Find Most Frequent Value in a List`

In [None]:
mylist = [2,2,5,7,4,8,4,9,6,8,7,5,7]
freq = max(set(mylist), key = mylist.count)
freq

7