# __¿Why Python as a tool?__

Python has several features that make it well suited for learning (and doing) data science:
- It 's open source, which means it is free.
- It works similarly in most applications, so it’s a predictable language which make it desirable in programming.
- It’s relatively simple to write and to maintain (and, in particular, to understand). Due to its pragmatism it is widely accepted by programmers. 
- It’s useful in many applications. It’s been designed as a general-purpose programming language, then it can cooperate with a variety of other software components, making it the right language for gluing together code written in other languages.
- It’s simpler, faster to process (both for humans and for tools). This is a very high-level language (VHLL), that affords high programmer productivity, making Python a strong development tool.
- It’s an object-oriented programming language that allows you some functional programming procedural style too. 
- It has lots of useful data science–related libraries. 


It is important to differentiate the Python implementation from the language itself. The language implementation is a system for executing the computer programs, and Python uses the _interpretation approach_, where the program is read as input by an interpreter, which performs the actions written in the program. 

At the time of this writing, the __Classic Python__ - also known as CPython, and often known as just Python - is the most up-to-date, solid and complete production-quality implementation of the language. CPython is a bytecode compiler, interpreter, and set of built-in and optional modules, all coded in standard C. 

Another interesting implementation of CPython is IPython, which enhances CPython interactive interpreter to make it more powerful and convenient. IPython has been refactored, now morphed into __*Jupyter Notebooks*__, an interactive programming environment that, among snippets of code, also lets you embed commentary in literate programming style and show the output of executing code. 

There is more to Python programming than just the language. There are plenty of libraries and extensions that suit almost any application. Most of the modules are fully functional in different versions of Python, and let code access functionality supplied by the underlying operating system or other software components such as graphical user interfaces (GUI’s), databases, and networks.Extensions also afford great speed in computationally intensive tasks such as XML parsing and numeric array computations, which is specially suitable for Data Science.

Being proficient in Python up to a fluent software developer standar is not needed to process data successfully. To be a high performer Data Scientist / Data Analyst, it’s recommended to get a solid foundation of the basis, understood as the lexical structure, data types, variables, and control flow statements and functions.

The distribution of Python I recommend the most is Anaconda. This distribution package compiles the Python standard libraries, some external extensions and the IPython implementation, so you are ready to tackle data analysis tasks. 

When doing data science with Python, your code is expected to be written in a _Pythonic way_, meaning it should be concise and efficient. Pythonic code is often associated with the use of list comprehensions, which are ways to implement useful data processing functionality with a single line of code.


## Lexical structure

The lexical structure of a programming language is the set of basic rules that govern how you write programs in that language. It is the lowest-level syntax of the language, specifying such things as what variable names look like and how to denote comments. 

Each Python source file is a text file, and it indicates the sequence of lines, tokens, or statements. 


### Lines and Indentation

A Python program is a sequence of *logical lines*, each made up of one or more physical lines. A physical line may end with a comment indicated by a hashtag ( `#` ) sign placed any place not inside a string literal. All characters after the `#`, up to but excluding the line end, is the comment: Python ignores them. 


In [21]:
# This is a single-line comment 
# There are no double-line \ 
# comments 

Python does not use delimiters, such as semicolon ( ; ) to denote the end of physical lines, the line end denotes the end of most statements. However, a logical line can be constituted by two or more physical lines but those must use a concatenator sign, such as backslash ( `\` ), an open parenthesis ( `(` ), bracket ( `[` ) or brace ( `{` ).  Physical lines after the first one are called *continuation lines*. Triple-quoted string literals can also span physical lines but those are mostly used into sql applications and longer comments into special applications.

In [13]:
# This is the most simple single-physical line logical line. In this case, an assignment 
variable = 5

# This is a two-physical line statement into a single logical line. 
# In this case it's an assignment of a data type called list to a variable.
variable = [1, 2,\
            3, 4] # This is the continuation line.

Python uses _indentation_ to express the block structure of a program. Blocks of code (statements) are denoted with the usage of indentation rather than braces, or other begin/end delimiter. A __block of statements__ is a contiguous sequence of logical lines, all indented by the same amount, and a logical line with less indentation ends the block. All the statements in a block must have the same indentation, as must all the clauses in a compound statement. 

In [22]:
# This is the first block of code. It is used to declare and initialize two variables
num1 = 6
num2 = 9

# This is the second block of code.This one prints the output of the sum
sum = num1 + num2
print('This is the output:', sum)

# In this case, both blocks have the same indentation. 

This is the output: 15


Python treats each tab as if it was up to 8 spaces, nevertheless the standard python style is to use four spaces per indentation level. __You must be careful because Python does not allow mixing tabs and spaces for indentation.__

### Tokens

These are the _elementary lexical components_ of a logical line. Tokens correspond to a substring of the logical line separated by whitespace. In the absence of whitespace, Python would parse them as a single longer identifier. The normal token types are _identifyers, keywords, operators, delimiters, and literals_.

__Identifiers__

These are names used to identify variables, functions, class names, modules or other objects. They always start with a letter or an underscore ( `_` ).Case is significant: lowercase and uppercase are distinct, and punctuation characters such as  `@` , `$` , and `!` are not allowed. 

Normal Python style is to start class names with an uppercase letter, and most other identifiers with lowercase  letters. Starting an identifier with a single leading underscore indicates by convention that the identifier is meant to be private. Starting an identifier with a double underscore indicates a strongly private identifier; if the identifier also ends with two trailing underscores, however, this means that it’s a language-defined special name. 

In [3]:
first_variable = 200    #Variables are named with lowercase letters. 
                        # Composed dentifiers are chained with an underscore.
_private_variable = 100 #This is a private variable

print("This is a regular variable: ",first_variable,", This is a private variable: ", _private_variable)

This is a regular variable:  200 , This is a private variable:  100


__Delimiters__

Python uses the following characters as delimiters and combinations as delimiters in various statements, expressions, and list, dictionary, and set literals and comprehensions, among other purposes. ` ’ ` and ` “ ` surround string literals.

In [6]:
(   )   [   ]   {   }
,   :   .   =   ;   @
+=  -=  *=  /=  //= %=
&=  |=  ^=  >>= <<= **=

SyntaxError: invalid syntax (1885509909.py, line 1)

__Keywords__

These are 35 reserved identifiers in Python for syntactic uses, that why they are sometimes known as *reserved words*. As any other identifiers, those words are case sensitive. They can be all listed by importing the keyword model and printed as follows:

In [5]:
import keyword 
print(keyword.kwlist)


['False', 'None', 'True', '__peg_parser__', 'and', 'as', 'assert', 'async', 'await', 'break', 'class', 'continue', 'def', 'del', 'elif', 'else', 'except', 'finally', 'for', 'from', 'global', 'if', 'import', 'in', 'is', 'lambda', 'nonlocal', 'not', 'or', 'pass', 'raise', 'return', 'try', 'while', 'with', 'yield']


__Literals__

These are direct denotations in a program of a data value (a number, string, or container). The following are number and string literals in Python: 


In [7]:
35             #Integer literal
3.14           #Float literal
1.0j           #Imaginary literal
'Hello'        #String literal 
"world"        #Another string literal 
"""Good
night"""       #Triple-quoted strin literal, spanning two lines

'Good\nnight'

Combining numbers and string literals with the appropriate delimiters, you can directly build many container types with those literals as values:


In [8]:
[25,4.52,'Hello']       #This is a list
[]                      #This is an empty list
200,305,567             #This is a tuple
(200,305,567)           #This is another tuple
()                      #This is an empty tuple
{'a':5, 'b':67}         #This is a dictionary
{}                      #This is an empty dictionary
{2, 5, 6, 7, 'letra'}   #This is a set
#There is no literal to denote an empty set. 

{2, 5, 6, 7, 'letra'}

__Operators__

Python uses non alphanumeric characters and character combinations as operators. They generally act in conjunction of an expression, which is defined as a *phrase* of code evaluated by python to produce a value. The simplest expressions are literals and identifiers, and you can build other expressions by joining subexpressions with the operators and/or delimiters. The simplest operators are those who represent simple math operations: 


In [16]:
#These are variables
a=15
b=3       

# Then we use operators as follows

# Sum:
c = a + b
# Subtraction:
d = a - b
# Product 
e = a * b
# Division
f = a / b

print('The result of the sum is: ', c)
print('The result of the subtraction is: ', d)
print('The result of the product is: ', e)
print('The result of the division is: ', f) # The result of a division is always a float

The result of the sum is:  18
The result of the subtraction is:  12
The result of the product is:  45
The result of the division is:  5.0


### Statements 

There are two types of statements, single and compound statements. 

__Simple Statements__

A simple statement is one that contains no other statements. It lies entirely within a logical line. In Python, you may place more than one simple statement in a logical line, with a semicolon (` ; `) as a separator. However, it is recommended to use one simple statement per line to increase readability. Any expression can be on its own as a simple statement. 

An *assignment* is a simple statement that assigns values to variables. We use the ` = ` operator, and can never be part of an expression. In that case, the ` := ` (walrus) operator is needed.


In [1]:
# This is a simple statement
variable = 345.76j

__Compound Statements__

A compound statement contains one or more other statements and controls it's execution. These statements are also known as blocks. It has one or more *clauses*, aligned at the same indentation. Each clause has a *header* starting with a keyword and ending with a colon ( : ), followed by a *body*, which is a sequence of one or more statements, are on separate logical lines after the header line, indented four spaces rightward.

In [2]:
# This is a compound statement 
for i in range(10):
    print("This is the ", i, "iteration of the loop")

This is the  0 iteration of the loop
This is the  1 iteration of the loop
This is the  2 iteration of the loop
This is the  3 iteration of the loop
This is the  4 iteration of the loop
This is the  5 iteration of the loop
This is the  6 iteration of the loop
This is the  7 iteration of the loop
This is the  8 iteration of the loop
This is the  9 iteration of the loop


## Data Types

In Python, any Data value is considered an _object_. Objects can be of many types, the categories under which data values are classified. Those categories can be built-in types in Python or personalized ones, the latter are also called _classes_. 

There are six types of built-in types in Python: *numbers, strings, lists, tuples, dictionaries, and sets*. Those types can be mutable or immutable. An immutable object means that it can not be altered or modified through operations, therefore, when you perform an operation on an immutable object, you produce a new immutable object or do not get a new result at all. 

### Numbers

Numeric types in Python include integers, floating-point numbers, and complex numbers. These are all immutable objects, and that means they produce a new number object when operations are applied on them. It's important to know that numeric literals do not include a sign; a leading ( + ) or ( - ), if present, I'd use a separate operator. 

- __Integers:__ Integers in Python can be decimal or non-decimal. A decimal literal is a sequence of digits whose first digit is a nonzero. The Decimal literal is the most regularly used integer literal in data science, however, there are specific applications where _binary, octal, or hexadecimal_ literals are needed.

- __Floating-point numbers:__ these literals are a sequence of decimal digits that includes a decimal point ( . ), an exponent suffix ( e or E ), or both. 

- __Complex numbers:__ these are literals made of two floating-point numbers, one arch for real and imaginary parts. The imaginary part is identified with the constant ( j ) added to it, which means a square root of -1. 

In [3]:
1 , 4 , 456, 4325678        #Are all integers
3.2, 345.59, 24567.456789   #Are all floating-piont numbers
1-3j, 3.2+943j              #Are all complex numbers

((1-3j), (3.2+943j))

*Iterables* is the Python concept that captures in abstract the iteration behavior of __sequences__, which are ordered containers of items, indexed by integers. The built-in types in Python are strings, lists and tuples. 

### Strings (str)

The *str* object is a sequence of characters used to store and represent text-based information. These objects are immutable: when an operation is performed on a str object, the result is always a new string object, rather than mutating an existing string. 

A variant of string literals are *raw string literals*, where escape sequences are not implemented. There are quotes literals immediately preceded by an ( r or R ). Raw string literals come in handy for strings that include many backslashes, especially regular expression patterns and Windows absolute filenames (which use backslashes as directory separators). 

In [10]:
'This is a literal string'
"This is another literal string"

'This is a string that \
spans two lines '                       #Comments not allowed on the previous line. 

"""
Using triple quoted lines we can:
use "quoted expresions into a string"
and also spanning several lines """     #Comments not allowed on previous lines.

r"C:\Users\main_user\directory_name"             #This is the recomended form to call directories in Python when using windows.

'C:\\Users\\main_user\\directory_name'

### Tuples

A tuple is an immutable object. That means that once created, it can not be changed. It can use mutable objects such as lists as tuple items, but best practice is generally to avoid doing so.  To denote a tuple use a series of expressions (the items of a tuple) separated by commas ( , ) and can optionally be enclosed in parentheses ` ( ) `.

Tuples are typically used to store collections of heterogeneous data; that is, data of different types. They are specially useful when you need a structure to hold the properties of a real world object. 


In [16]:
23, 531, 4667           #This is a tuple with 3 items. Parentheses optional
(2.72,)                 #This is a tuple with one item. needs trailing comma
()                      #This is an empty tuple. Parenthesis not optional
tuple('wow')            #Built-in function that creates a new tuple ('w', 'o', 'w')
x = [1,2,4,6]
tuple(x)                #This creates and returns a tuple whose items are the same as those in x

(1, 2, 4, 6)

### Lists

A list is a mutable ordered sequence of items, meaning you can add, remove, and modify a list's elements. The items of a list are arbitrary objects and may be of different types. Lists can have duplicate elements. To denote a list, use a series of expressions (the items of the list) separated by commas ( , ), within brackets. 

Although Python does allow you to have elements of different data types in the same list, best practice suggests using lists to contain elements that represent a series of usually related, similar things that can be grouped together. A typical list contains only elements belonging to a single category (that is homogeneous data, such as people's names, article titles, or participant numbers). 

In [21]:
[42, 3.14, 'hello']             #List with three items. Brackets [ ] are mandatory
['title']                       #List with only one item
[]                              #Empty list
list('wow')                     #Built-in function that creates the list ['w', 'o', 'w']
x = 'letras'
list(x)                         #This creates and returns a list whose items are the same as those in x

['l', 'e', 't', 'r', 'a', 's']

### Sets

A Python set is an unordered collection of unique items. Duplicate items are not allowed in a set. To denote a *set*, you can use a series of elements separated by commas ( , ) within braces ( `{ }` ).

In [24]:
{42, 3.14, 'hello'}             #Set with three items. Brackets [ ] are mandatory
{'title'}                       #Set with only one item
set()                           #Empty set. {} in an empty dictionary
set('wow')                      #Built-in function that creates the set {'o', 'w'}
x = 'letras'
set(x)                          #This creates and returns a set whose items are the ordered letters


{'a', 'e', 'l', 'r', 's', 't'}

### Dictionaries

Dictionaries are the only single mapping type provided by Python. Dictionaries are mutable, unordered collections of _key-value_ pairs, where each key is a unique name that identifies an item of data, the value. Each key is separated from its value by a colon ( : ), and key-value pairs are separated by commas ( , ), within braces ( `{ }` ). Dictionaries, like tuples, are useful for storing heterogeneous data about real-world data.

In [28]:
{'x':42, 'y':3.14, 'z':35}      #This is a dictionary with 3 items, str keys
{1:56, 34:964}                  #This is a dictionary with 2 items, int keys
{23:'za', 'br':235}             #This is a dictionary with 2 items, different keys
{}                              #This is an empty dictionary
dict()                          #Built-in function that also produce an empty dictionary
dict(x=42,y=3.14,z=35)          #Built-in function that creates the dictionary {'x': 42, 'y': 3.14, 'z': 35}


{'x': 42, 'y': 3.14, 'z': 35}

There are two special data types worth knowing, __None__ and the __Ellipsis ( … )__. 

The None denotes a null object, and has no methods or other attributes. Its suitable to use None as a placeholder when a reference is needed but you don't care what object you refer to, or when you need to indicate that no object is there. Functions return None as their result unless they have specific return statements coded to return other variables. None can be used as a dictionary key.

The Ellipsis, written as three periods with no intervening spaces ( … ), is a special object used in numerical applications or as an alternative to None when None is a valid entry. 


## Variables and other references.

Python accesses data values through *references*. A reference is a name that refers to a value (object). References take the form of variables, attributes, and items. 

__Variables__ is the name used to reference a value (object). The existence of a variable begins with a statement that binds the variable (in other words, set a name to hold a reference to some object), that's why there are no declarations in Python. A variable has no intrinsic value, the type of variable is defined by the object it refers to. The __del__ statement unbinds a variable reference, although doing so is rare. Any identifier can be used to name a variable except the 30-plus reserved keywords.

Attributes and Items are identifiers applicable to an object. An __Attribute__ is a function called on an object through the use of an attribute name preceded by a period ( `.` ). An __item__ is also a form to get information from an object by the usage of an index or key in a set of brackets ( `[ ]` ) added to it. Attributes and items are widely used in data science libraries as pandas to get information from objects. 

Assignment statements can be plain or augmented. Plain assignment to a variable is how you create a new variable or rebind an existing variable to a new value, attribute or item. Augmented assignment cannot, per se, create new references, but it can rebind a variable or and an attribute.


## Python Ecosystem
_Pandas:_ For data analysis.
_Matplotlib:_ foundational library for visualization. <br>
_Numpy:_ The numeric library that serves as the fundation of all calculation in Python. <br>
_Seaborn:_ A statistical visualization tool built on top of matplotlib. <br>
_Statsmodel:_ A library with many advanced statistical functions. <br>
_Scipy:_ Advanced scientific computing, including functions for optimization, linear algebra, image processing and more. <br>
_Scikit-Learn:_ The most popular machine learning library for python (not deep learning) <br>

Among many other tools for specific use-cases.