# Introduction To Python, strings, lists, dictionaries, lambda, map, zip, enumerate, generators
by Martín Araya

# numpy and pandas introduction
23/feb/2021

## map
The function __map__ is used to apply a function to each item in an iterable object (i.e.: list, tuple, dictionary).
  
__map__ is commonly used to cast the items of a list to a different type, like converting into string the numbers contained inside a list.

In [None]:
# define a list 
L=[1,2,3,4,5,6,7]
print('the list:',L)


# convert and concatenate the numbers into a string 
LS=''.join( map(str,L) )
print('the string:',LS)

notice that __.join()__ method is equivalent to the following loop

In [None]:
s=''
for l in L :
   s=s+str(l)
print('the string:',s)

In [None]:
# keep in mind that .join can concatenate any string between the items of the iterable
' my_string '.join( map(str,L) )

__map__ returns an object that will provide the values from the _iterable_ evaluated by the _function_ as they are requested. In other words, __map__ by itselft will never evaluate the entire set of data.

In [None]:
# notice that defining the map itself does not return the list L with their items converted to string 
map(str,L)

in order to display the values evaluated through the __map__ we need to _expand_ it by converting it into a list, dictionary, tuple, calling it from a __for__ loop, etc...

In [None]:
# list with values converted to string unsing str function
list( map(str,L) )

In [None]:
for i in map(str, L) :
    print(type(i),ni)

The map instance by itself doesn´t contain the resulting values of the function applied to the items of the itarable container, therefore, it can not be directly accessed.  
The following loop will return a TypeError as it is not possible to directly subscript an item from the map instance:

In [None]:
i = 0
while i < 7 :
    print(type(i), map(str, L)[i] )

## generators

Notice that __map__ is not subscriptable, it is easier to use it in a __for__ loop.
  
To be able to use __map__ in a while, we need to associate the map object to a variable and then call the **__next__()** method of the _map object_ inside the __while__.

This kind of objects are called _generators_

In [None]:
i = 0
m = map(str,L)
while i < 7 :
    print(type(i), m.__next__() )
    i+=1

as soon as the _generator_ __m__ reach its last item it will stop providing values and will raise a __StopIteration__ error

In [None]:
m.__next__()

The **generator**s are designed to improve computation performance as the outputs are calculated when requested, not in advance.
  
Defining a __generator__ is similar to define a __function__, but instead of using _return_ to give back the results we have to use _yield_

In [None]:
# a generator to provide unlimited power numbers

# def to define it, like any function
def MyGenerator() :
    i = 0
    while i >= 0 : # put the operations inside a loop to keep them live and ready 
        yield i**i # yield instead of return
        i += 1

MyGen = MyGenerator()

In [None]:
for i in MyGen :
    print(i)
    if i > 30 :
        break

A nice that the __generator__ can do is keep track of the last value it has _yield_.  
We can _iterate_ from the generator again and it will not start from zero or load the input data again:

In [None]:
for i in MyGen :
    print(i)
    if i > 1000 :
        break

or call the **__next__()** method

In [None]:
print( MyGen.__next__() )

Notice that the generator we have just created is contained in a infinite loop and then it will never end, but it is not constantly running while we are not using it.  
  
We can define a generator in any way we would require, here below an example:

In [None]:
# a generator to provide numbers certain number of power of two

# def to define it, like any function
def powerange(limit=0) :
    i = 0
    while i < limit : # put the operations inside a loop to keep them live and ready 
        yield 2**i # yield instead of return
        i += 1

In [None]:
# call the generator evaluated in the required arguments
for i in powerange(10) :
    print(i)

# automatically stops generating numbers after 10 iterations

In this last example the generator is not kept in memory after the loop execution, like the __range__ generator.  
If we call it again it will start from zero:

In [None]:
for i in powerange(3) :
    print(i)

In order to keep the generator alive we have to store its excuton in a variable, like we did before:

In [None]:
PG = powerange(100)

In [None]:
# start printing values from PG
T = True
while T :
    value = PG.__next__()
    print( value )
    if value > 100 :
        break    

In [None]:
# continue printing values from PG
for i in PG :
    print(i)
    if i > 1000 :
        break

In [None]:
# call a couple of value using .__next__() method
print( PG.__next__() )
print( PG.__next__() )

Other example of generator usuful to defoliate the daisy (_deshojar la margarita_)

In [None]:
def daisy() :
    petal = False
    while True :
        yield petal
        petal = not petal

defoliate = daisy()

for i in range(12) :
    print( defoliate.__next__() * 'no ' + 'me quiere...' )

More info related to _generators_ can be found here: https://wiki.python.org/moin/Generators

## lambda
__lambda__ is used to define simple (or **could be complex, but is not recommended**) _anonymous_ functions in a single line of code, without requiring any description.
  
__lambda__ is very handy to create functions for simple operations like mathematical operations or string operations
  
To define __lambda__ we have to provide input arguments and the operation to be performed:  
_variable_ = __lambda__ _input(s)_ __:__ _operations_over_input(s)_
  
When defining and using a __lambda__ we have to be careful about the type of inputs this functions will have to handle.

In [None]:
# we can define lambda with one or serveral inputs
F=lambda x,y : x*3+y
G=lambda x : x*3

# we can define a lambda to uperate with a particular type of object, 
# but it will fail if we provided a diferent kind of input 
U=lambda s : s.upper()

# we could also define a lambda to perform different operations, according to the kind of input
X=lambda z : U(z) if type(z) is str else G(z)

# the plevious lambda, saved in the variable X, is equivalent to the definition here below
def Y(z) :
    if type(z) is str :
        return U(z)
    else :
        return G(z)


print('F(4,5)',F(4,5))
print('G(3)',G(3))
print("U('f')",U('f'))
print("X('h')",X('h'))
print("X(5)",X(5))
print("Y('h')",Y('h'))
print("Y(5)",Y(5))

It is possible to apply our functions using the __map__ function

In [None]:
# list with values of list L evaluated under G function
list( map(G,L) )

## strings
warming up with string and loop

In [None]:
# define a string
S='abcdefghijklmnopqrstuvwxyz'

In [None]:
S

applying the _lambda_ function __U__ to the _string_ __S__ using map

In [None]:
# notice that the characters passed by map to the for loop are already uppercase
''.join(list( map(U,S) ))

In [None]:
# loop iterating over the position of the characters in the string
for i in range(len(S)) :
    j = S[i]
    print(type(i),i,type(j),j,j.upper())

In [None]:
# loop iterating directly over characters in the string
i = 0
for j in S :
    i += 1
    print(type(i),i,type(j),j,j.upper())

In [None]:
# alternative loop iterating directly over characters in the string
for j in S :
    i = S.index(j)
    print(type(i),i,type(j),j,j.upper())

# be careful because .index() method will return always the first time the character is found inside the string

## enumerate
the __enumerate__ function is very useful combined with a __for__ as it can provide at the same time the index of the extracted item togheter with the item from the iterable.

__enumerate__ return to two items to the __for__ loop, then we need to variables to allocate them:

for _variable1_,_variable2_ in __enumerate__( _iterable_ ) :

In [None]:
for i,j in enumerate(S) :
    print(type(i),i,type(j),j,j.upper())

using __enumerate__ makes this loop much simpler than the alternative loops 

notice that __enumerate__ returns a tuple of two values, that if received by the two variables _i_ and _j_ are automatically extracted from the tuple.

In [None]:
# enumerate returns a tuple
for k in enumerate(S) :
    i,j=k
    print(type(k),k,type(i),i,type(j),j,j.upper())

the output from __enumerate__ can be easily converted into a dictionary:

In [None]:
Dict = dict(enumerate(S))
Dict

# dictionaries
using a __for__ We can loop from a _dict_ in sereval ways:

In [None]:
# print keys by default
for d in Dict :
    print(type(d),d)

In [None]:
# print keys explicitly
for d in Dict.keys() :
    print(type(d),d)

In [None]:
# print values
for d in Dict.values() :
    print(type(d),d)

In [None]:
# print keys and values
for d in Dict.items() :
    print(type(d),d)

In [None]:
# print keys and values
for k,v in Dict.items() :
    print(type(k),k,type(v),v)

## zip
The __zip__ function is very useful to mix together two lists and create pairs of values with the lists.

In [None]:
# let's create a second list using our list L
GL1 = []
for i in L :
    GL1.append( G(i) )
GL1

In [None]:
# create a dictionary using 
D1 = dict( zip( L , GL1 ) )
D1

## intro to numpy
If we apply the _lambda_ function __G__ defined as:  
_G = lambda x : x*3_  
to a list, willing to obtain the numbers multiplied by three we will be dissapointed because multiplying a list will repeat the list instead of multiply the numbers:

In [None]:
print(L)
print(G(L)) # remenber that G = lambda x : x*3

To achieve what we want we would have to operate through the list using a __for__ or __while__ loop or in a single line  using _list comprehension_.
**_list comprehension_** is the ability of the *list*s to excute a __for__ loop inside the list definition.

In [None]:
GL2 = [ G(l) for l in L ]
GL2

To use __numpy__ we start by *import*ing the library.  
For further details about __numpy__ please visit https://numpy.org/  
__numpy__ is commonly abreviated as __np__

In [None]:
import numpy as np

The basic structure in numpy is the **_array_**. To start loading data into numpy, we can convert a list into _np.array_  
  
In order to build an array, all the elements of the array must be of the same type. The array can't contain diferent types of elements, like the list can do.

In [None]:
Ar = np.array( L )
Ar

using arrays we can perform mathematical operations directly over the array and the operation is performed on every item inside the array:

In [None]:
# multiply the array
Ar*3

In [None]:
# apply a function to the array
G(Ar)

The array can have several dimensions, not just 1D.

In [None]:
# array created with three lists
Ar2 = np.array( [ L , GL2 , G(G(Ar)) ] )
Ar2

In [None]:
Ar2+5

On top of the regulat math operations, __numpy__ already has coded several useful operations to apply over the _array_, like summation, mean, median, maximum, minimum, etc

In [None]:
Ar2.sum()

In [None]:
Ar2.mean()

The dimensions of an array are contained in the property __.shape__ of the array

In [None]:
# 1D array
Ar.shape

In [None]:
# 2D array
Ar2.shape

To transpose the array, simply call the method **.transpose()** or the property **.T**

In [None]:
Ar2.transpose()

In [None]:
Ar2.T

The _array_ can also be compared to a condition, resulting in a bolean array

In [None]:
Ar3 = Ar2>10
Ar3

The most common properties and operations of the **list**s are also available for the **np.array**.  
The array is *subscriptable* and its items can be accessed using **[ ]** 

In [None]:
# for a single item, a single position
Ar[5]

In [None]:
# for several items, a slice
Ar[3:7]

We can also provide a consition instead of fixed positions to access the items in the array

In [None]:
Ar2[ Ar2>30 ]

or provide a bolean array with the same dimension of the main array, then, only the positions coinciding the a True will be returned 

In [None]:
Ar2[ Ar3 ]

An array can also contain strings:

In [None]:
# string array with a single item
Ar4 = np.array(S)
Ar4

In [None]:
# convert the string into list and then convert the list tino array
Ar5 = np.array(list(S))
Ar5

Keep in mind that as __numpy__ is designed for numbers, not all the Python basic string operations works in an array.

## intro to pandas
__Pandas__ is a class designated to contain 1D or 2D arrays in a more readable format and to facilitate common operations with it.  
All the details about pandas can be found here: https://pandas.pydata.org/  
  
The basic structure of pandas are the __Series__ and the __DataFrame__  
The __Series__ consist of an array with an associated _index_ array. It is like a joint of two arrays that work together.  
The __DataFrame__ is composed of sereval arrays associated to an _index_ array.
  
We start using __pandas__ importing it. The usual alias for pandas is __pd__

In [None]:
import pandas as pd

We can convert numpy.arrays, dictionaries, lists, etc to pandas DataFrames

In [None]:
# DataFrame from a numpy array
df1 = pd.DataFrame( Ar2 )
df1

In [None]:
# DataFrame from a numpy array transposed
df2 = pd.DataFrame( Ar2.T )
df2

notice the **DataFrame** constructed from the array *does not have names on its columns*.  
we can define the names using the **.columns** property

In [None]:
# define column names
df2.columns=['A','B','C']
df2

to access and to change the _index_ values we can assign them to the **.index** property

In [None]:
df2.index = df2.index / 2
df2

We can construct a **DataFrame from a dictionary**.  
In this case, the _keys_ will be the name of the columns and the _values_ will be the data contained in that column

In [None]:
df3 = pd.DataFrame( {'A':[1,2,3,4,5],'B':[6,7,8,9,0]} )
df3

We can access the data in the DataFrame in a similar way to how we access and set the data in a dictionary.  
To access the data we use **[[** *name of column* **]]** or **[** *name of column* **]** where _single square bracket_ will return a **Series** while _double square bracket_ will return a **DataFrame**

In [None]:
df2[['A']]

In [None]:
df2[['A','C']]

To access the data by position we can use **.iloc[** *row* , *column* **]** where the column parameter is optional

In [None]:
df2.iloc[[2]]

In [None]:
df2.iloc[[-1]]

In [None]:
# row and column will return the single value
df2.iloc[2,2]

In [None]:
# a range of row positions 
df2.iloc[2:5]

In [None]:
# an entire column by position
df2.iloc[:,2]

instead of **.iloc** we can use **.loc** to access the data by the value of their corresponding index and column, instead of their positions. Appart from that, works exactly as .iloc

In [None]:
df2.loc[1]

In [None]:
df2.loc[[1]]

In [None]:
# access data by slice
df2.loc[0.9:2.7]

notice that, in case of numeric index, the minimum and maximum limits does not need to be present in the index

In [None]:
# access a particular location
df2.loc[2,'A']

With pandas it is very easy to explore the data, using the predefined __plots__

In [None]:
df2.plot()

In [None]:
df2.plot(kind='box')

In [None]:
df2.plot(kind='scatter',x="A",y="B")