# Chapter 1. Introduction to Python

# 1.1 Python Basics
The classification or categorization of data items

## Variable assignment

In [1]:
x = 1
x

1

## Data types

### Numbers (int, float)
- integer, a number wihtout a fractional part.
- floating point, a number that has both an integer and a fractional part, separated by a point.

In [2]:
my_integer = 4
print(my_integer)

my_float = 3.141592
print(my_float)

4
3.141592


### String (str)
- A type to represent text. Single or double quotes can be used to build a string.

In [3]:
my_string = "Hello, I am Daniel and work as a Data Scientist."
print(my_string)

Hello, I am Daniel and work as a Data Scientist.


### Boolean (bool)
- A type to represent logical values. It can only be True or False

In [4]:
my_boolean = True
print(my_boolean)

True


## Finding about data types.

### type()
- It is a function that finds out the type of a value (or a variable that refers to that value).

        type(1) = int
        type("Hello, world!") = str
        type(False) = bool
        type([a, b, c]) = list

### Data type conversion functions

We can convert a Python value into another:
    
    int(x) converts a Python value into an integer.
    float(x) converts a Python value into a floating point.
    str(x) converts a Python value into a string.
    bool(x) converts a Python value into a boolean
    list(x) converts a Python value into a list

# 1.2 Python Lists

**[a, b, c]**
- It allows to store any type of information (string, numbers, booleans)
- You can also create little sublists for each category (list of lists)
- To access the information in the list, we use an index. The first element always starts with index 0. We can use a positive index (beggining of the list) or negative index (end of the list).
  
        list_1[0]
        list_2[-4])

- Slicing allows to select multiple elements from a list, creating a new list.

        list_1[start:end]
        start is inclusive, end is exclusive

## Example of a list

In [5]:
a = 34.5
b = "Logan"
c = True
d = 7
e = "Up"
f = False
list_1 = [a, b, c, d, e, f]
list_1

[34.5, 'Logan', True, 7, 'Up', False]

##  List of lists

In [6]:

list_2 = [["Mexico", 12],
         ["USA", 3],
         ["Japan", 2],
         ["Germany", 7]]
list_2

[['Mexico', 12], ['USA', 3], ['Japan', 2], ['Germany', 7]]

## Subsetting lists

In [7]:
print(list_1[0])
print(list_2[3])

print(list_1[-1])
print(list_2[-4])

34.5
['Germany', 7]
False
['Mexico', 12]


## List slicing

In [8]:

print(list_1[2:5])
print(list_1[:5])
print(list_1[2:])

[True, 7, 'Up']
[34.5, 'Logan', True, 7, 'Up']
[True, 7, 'Up', False]


## Changing list elements

In [9]:
list_1[1:3] = ["China", 5]
print(list_1)

[34.5, 'China', 5, 7, 'Up', False]


## Adding elements

In [10]:
list_3 = list_1 + ["Logan", 7]
print(list_3)

[34.5, 'China', 5, 7, 'Up', False, 'Logan', 7]


## Removing elements

In [11]:
del(list_3[2])
print(list_3)

[34.5, 'China', 7, 'Up', False, 'Logan', 7]


# 1.3 Functions and packages

## Python Functions
A piece of reusable code that solves a particular task.
- Pro tip: Call a function instead of writing code by yourself.
- How to find functions: if it's a standard task, then a function actually exists! Just search it in the internet.

        help(function) or ?function, either way it will open up documentation for Python's built-in functions.

In [12]:
# A few examples
# type()
# help()
# print()
# round()
# pow()
# int()
# float()
# str()
# bool()
# len()
# max()
# sorted()

## Python Methods
Functions that are associated with an object and can manipulate its data or perform actions on it

### List methods

In [13]:
fam = ['liz', 1.73, 'emma', 1.68, 'mom', 1.71, 'dad', 1.89]

#### index()

In [14]:
# index() indicates the assigned index of a list element
fam.index('mom')

4

#### count()

In [15]:
# count() counts the number of times an element repeats itself
fam.count(1.73)

1

### String methods

In [16]:
# sister = 'liz'
sister = fam[0]

#### capitalize()

In [17]:
# capitalize() returns a string with the first letter capitalized.
sister.capitalize()

'Liz'

#### replace()

In [18]:
# replace() returns a string with changed elements.
sister.replace('z','sa')

'lisa'

#### upper(), lower()

In [19]:
# upper() and lower() return a string with all its elements uppercase and lowercase, respectively.
sister.upper()

'LIZ'

#### count()

In [20]:
# count() counts the number of times an element repeats itself.
sister.count("i")

1

### Methods available for both lists and strings

#### index()

In [21]:
# index() returns the associated index of a list's/string's element.
print(fam.index('mom'))
print(sister.index('z'))

4
2


### Methods than can change the objects they are called on

#### append()

In [22]:
# append() adds a new element at the end of the list.
fam.append("me")
fam.append("1.79")
fam

['liz', 1.73, 'emma', 1.68, 'mom', 1.71, 'dad', 1.89, 'me', '1.79']

#### remove()

In [23]:
# remove() removes the first of a list that matches input
fam.remove("1.79")
fam

['liz', 1.73, 'emma', 1.68, 'mom', 1.71, 'dad', 1.89, 'me']

#### reverse()

In [24]:
# reverse() reverses the order of the elements on a list
fam.reverse()
fam

['me', 1.89, 'dad', 1.71, 'mom', 1.68, 'emma', 1.73, 'liz']

## Python Packages
Directories that contain Python scripts (modules).
- Modules especify functions, methods, types
- Some are available by default, others need to be installed in your system.
- [pip documentation](https://pip.pypa.io/en/stable/)

        Install: py -m pip install <package>
        Upgrade: py -m pip install <package> --upgrade
        Uninstall: py -m pip uninstall <package>
        Purge caché (one package, all items): py -m pip cache remove <pattern>, py -m pip cache purge
        List of installed packages: py -m pip list

### How to import a package

In [25]:
# importing the package through another name and using it (most used)
import numpy as np
np.array([1,2,3])

# importing and using the package
# import numpy
# numpy.array([1,2,3])

# importing an especific function from a package
# from numpy import array
# array([1,2,3])

array([1, 2, 3])

# 1.4 Numpy
- Numpy arrays contain only one type. Because of this, it can speed up calculations.
- You can create subsettings.
- If you change the type of an element, the whole array will be coerced to that type.

In [26]:
height = [1.73, 1.68, 1.72, 1.89, 1.79]

In [27]:
weight = [65.4, 59.2, 63.6, 88.4, 68.7]

In [28]:
import numpy as np
np_height = np.array(height)
np_height

array([1.73, 1.68, 1.72, 1.89, 1.79])

In [29]:
np_weight = np.array(weight)
np_weight

array([65.4, 59.2, 63.6, 88.4, 68.7])

In [30]:
bmi = np_weight / (np_height**2)
bmi

array([21.85171573, 20.97505669, 21.49810708, 24.7473475 , 21.44127836])

In [31]:
python_list = [1,2,3]
python_array = np.array(python_list)

In [32]:
python_list + python_list

[1, 2, 3, 1, 2, 3]

In [33]:
python_array + python_array

array([2, 4, 6])

In [34]:
bmi

array([21.85171573, 20.97505669, 21.49810708, 24.7473475 , 21.44127836])

## Numpy subsetting

In [35]:
bmi[1]

20.97505668934241

In [36]:
bmi > 23

array([False, False, False,  True, False])

In [37]:
bmi_values = bmi[bmi > 23]
bmi_values

array([24.7473475])

## 2D Numpy Arrays

    np.array.shape
    np.array.[n]
    np.array.[n][m], or np.array[n,m] (an element from array)
    np.array[:,m1:m3] (several elements from array)

In [38]:
np_2d = np.array([[1.73, 1.68, 1.71, 1.89, 1.79],
                  [65.4, 59.2, 63.6, 88.4, 68.7]])
np_2d

array([[ 1.73,  1.68,  1.71,  1.89,  1.79],
       [65.4 , 59.2 , 63.6 , 88.4 , 68.7 ]])

In [39]:
np_2d.shape # 2 rows, 5 columns

(2, 5)

### Subsetting

In [40]:
np_2d[0]

array([1.73, 1.68, 1.71, 1.89, 1.79])

In [41]:
np_2d[0][2]

1.71

In [42]:
np_2d[0,2]

1.71

In [43]:
np.array([[1.73, 1.68, 1.71, 1.89, 1.79],
                  [65.4, 59.2, 63.6, 88.4, "68.7"]])

array([['1.73', '1.68', '1.71', '1.89', '1.79'],
       ['65.4', '59.2', '63.6', '88.4', '68.7']], dtype='<U32')

In [44]:
np_2d[:,1:3]

array([[ 1.68,  1.71],
       [59.2 , 63.6 ]])

## Numpy Basic Statistics

    np.mean()
    np.median()
    np.corrcoef()
    np.std()
    np.sum()
    np.sort()

### Generate data with Numpy

    np.random.normal()
        arguments: distribution mean, distribution standard deviation, number of samples

    np.column_stack(array1, array2, ..., n)
        pastes arrays together as n columns.

In [45]:
height = np.round(np.random.normal(1.75,0.20,5000),2)
weight = np.round(np.random.normal(60.32,15,5000),2)
np_city = np.column_stack((height,weight))

In [46]:
np_city

array([[ 1.57, 63.33],
       [ 1.8 , 52.63],
       [ 1.51, 55.82],
       ...,
       [ 1.66, 71.31],
       [ 1.45, 76.75],
       [ 1.64, 56.85]])