# __¿Why Python as a tool?__

Python has several features that make it well suited for learning (and doing) data science:
- It 's open source, which means it is free.
- It works similarly in most applications, so it’s a predictable language which make it desirable in programming.
- It’s relatively simple to write and to maintain (and, in particular, to understand). Due to its pragmatism it is widely accepted by programmers. 
- It’s useful in many applications. It’s been designed as a general-purpose programming language, then it can cooperate with a variety of other software components, making it the right language for gluing together code written in other languages.
- It’s simpler, faster to process (both for humans and for tools). This is a very high-level language (VHLL), that affords high programmer productivity, making Python a strong development tool.
- It’s an object-oriented programming language that allows you some functional programming procedural style too. 
- It has lots of useful data science–related libraries. 


It is important to differentiate the Python implementation from the language itself. The language implementation is a system for executing the computer programs, and Python uses the _interpretation approach_, where the program is read as input by an interpreter, which performs the actions written in the program. 

At the time of this writing, the __Classic Python__ - also known as CPython, and often known as just Python - is the most up-to-date, solid and complete production-quality implementation of the language. CPython is a bytecode compiler, interpreter, and set of built-in and optional modules, all coded in standard C. 

Another interesting implementation of CPython is IPython, which enhances CPython interactive interpreter to make it more powerful and convenient. IPython has been refactored, now morphed into __*Jupyter Notebooks*__, an interactive programming environment that, among snippets of code, also lets you embed commentary in literate programming style and show the output of executing code. 

There is more to Python programming than just the language. There are plenty of libraries and extensions that suit almost any application. Most of the modules are fully functional in different versions of Python, and let code access functionality supplied by the underlying operating system or other software components such as graphical user interfaces (GUI’s), databases, and networks.Extensions also afford great speed in computationally intensive tasks such as XML parsing and numeric array computations, which is specially suitable for Data Science.

Being proficient in Python up to a fluent software developer standar is not needed to process data successfully. To be a high performer Data Scientist / Data Analyst, it’s recommended to get a solid foundation of the basis, understood as the lexical structure, data types, variables, and control flow statements and functions.

The distribution of Python I recommend the most is Anaconda. This distribution package compiles the Python standard libraries, some external extensions and the IPython implementation, so you are ready to tackle data analysis tasks. 

When doing data science with Python, your code is expected to be written in a _Pythonic way_, meaning it should be concise and efficient. Pythonic code is often associated with the use of list comprehensions, which are ways to implement useful data processing functionality with a single line of code.


## Lexical structure

The lexical structure of a programming language is the set of basic rules that govern how you write programs in that language. It is the lowest-level syntax of the language, specifying such things as what variable names look like and how to denote comments. 

Each Python source file is a text file, and it indicates the sequence of lines, tokens, or statements. 


### Lines and Indentation

A Python program is a sequence of *logical lines*, each made up of one or more physical lines. A physical line may end with a comment indicated by a hashtag ( `#` ) sign placed any place not inside a string literal. All characters after the `#`, up to but excluding the line end, is the comment: Python ignores them. 


In [21]:
# This is a single-line comment 
# There are no double-line \ 
# comments 

Python does not use delimiters, such as semicolon ( ; ) to denote the end of physical lines, the line end denotes the end of most statements. However, a logical line can be constituted by two or more physical lines but those must use a concatenator sign, such as backslash ( `\` ), an open parenthesis ( `(` ), bracket ( `[` ) or brace ( `{` ).  Physical lines after the first one are called *continuation lines*. Triple-quoted string literals can also span physical lines but those are mostly used into sql applications and longer comments into special applications.

In [13]:
# This is the most simple single-physical line logical line. In this case, an assignment 
variable = 5

# This is a two-physical line statement into a single logical line. 
# In this case it's an assignment of a data type called list to a variable.
variable = [1, 2,\
            3, 4] # This is the continuation line.

Python uses _indentation_ to express the block structure of a program. Blocks of code (statements) are denoted with the usage of indentation rather than braces, or other begin/end delimiter. A __block of statements__ is a contiguous sequence of logical lines, all indented by the same amount, and a logical line with less indentation ends the block. All the statements in a block must have the same indentation, as must all the clauses in a compound statement. 

In [22]:
# This is the first block of code. It is used to declare and initialize two variables
num1 = 6
num2 = 9

# This is the second block of code.This one prints the output of the sum
sum = num1 + num2
print('This is the output:', sum)

# In this case, both blocks have the same indentation. 

This is the output: 15


Python treats each tab as if it was up to 8 spaces, nevertheless the standard python style is to use four spaces per indentation level. __You must be careful because Python does not allow mixing tabs and spaces for indentation.__

### Tokens

These are the _elementary lexical components_ of a logical line. Tokens correspond to a substring of the logical line separated by whitespace. In the absence of whitespace, Python would parse them as a single longer identifier. The normal token types are _identifyers, keywords, operators, delimiters, and literals_.

__Identifiers__

These are names used to identify variables, functions, class names, modules or other objects. They always start with a letter or an underscore ( `_` ).Case is significant: lowercase and uppercase are distinct, and punctuation characters such as  `@` , `$` , and `!` are not allowed. 

Normal Python style is to start class names with an uppercase letter, and most other identifiers with lowercase  letters. Starting an identifier with a single leading underscore indicates by convention that the identifier is meant to be private. Starting an identifier with a double underscore indicates a strongly private identifier; if the identifier also ends with two trailing underscores, however, this means that it’s a language-defined special name. 

In [3]:
first_variable = 200    #Variables are named with lowercase letters. 
                        # Composed dentifiers are chained with an underscore.
_private_variable = 100 #This is a private variable

print("This is a regular variable: ",first_variable,", This is a private variable: ", _private_variable)

This is a regular variable:  200 , This is a private variable:  100


__Delimiters__

Python uses the following characters as delimiters and combinations as delimiters in various statements, expressions, and list, dictionary, and set literals and comprehensions, among other purposes. ` ’ ` and ` “ ` surround string literals.

In [6]:
(   )   [   ]   {   }
,   :   .   =   ;   @
+=  -=  *=  /=  //= %=
&=  |=  ^=  >>= <<= **=

SyntaxError: invalid syntax (1885509909.py, line 1)

__Keywords__

These are 35 reserved identifiers in Python for syntactic uses, that why they are sometimes known as *reserved words*. As any other identifiers, those words are case sensitive. They can be all listed by importing the keyword model and printed as follows:

In [5]:
import keyword 
print(keyword.kwlist)


['False', 'None', 'True', '__peg_parser__', 'and', 'as', 'assert', 'async', 'await', 'break', 'class', 'continue', 'def', 'del', 'elif', 'else', 'except', 'finally', 'for', 'from', 'global', 'if', 'import', 'in', 'is', 'lambda', 'nonlocal', 'not', 'or', 'pass', 'raise', 'return', 'try', 'while', 'with', 'yield']


__Literals__

These are direct denotations in a program of a data value (a number, string, or container). The following are number and string literals in Python: 


In [7]:
35             #Integer literal
3.14           #Float literal
1.0j           #Imaginary literal
'Hello'        #String literal 
"world"        #Another string literal 
"""Good
night"""       #Triple-quoted strin literal, spanning two lines

'Good\nnight'

Combining numbers and string literals with the appropriate delimiters, you can directly build many container types with those literals as values:


In [8]:
[25,4.52,'Hello']       #This is a list
[]                      #This is an empty list
200,305,567             #This is a tuple
(200,305,567)           #This is another tuple
()                      #This is an empty tuple
{'a':5, 'b':67}         #This is a dictionary
{}                      #This is an empty dictionary
{2, 5, 6, 7, 'letra'}   #This is a set
#There is no literal to denote an empty set. 

{2, 5, 6, 7, 'letra'}

__Operators__

Python uses non alphanumeric characters and character combinations as operators. They generally act in conjunction of an expression, which is defined as a *phrase* of code evaluated by python to produce a value. The simplest expressions are literals and identifiers, and you can build other expressions by joining subexpressions with the operators and/or delimiters. The simplest operators are those who represent simple math operations: 


In [16]:
#These are variables
a=15
b=3       

# Then we use operators as follows

# Sum:
c = a + b
# Subtraction:
d = a - b
# Product 
e = a * b
# Division
f = a / b

print('The result of the sum is: ', c)
print('The result of the subtraction is: ', d)
print('The result of the product is: ', e)
print('The result of the division is: ', f) # The result of a division is always a float

The result of the sum is:  18
The result of the subtraction is:  12
The result of the product is:  45
The result of the division is:  5.0


### Statements 

There are two types of statements, single and compound statements. 

__Simple Statements__

A simple statement is one that contains no other statements. It lies entirely within a logical line. In Python, you may place more than one simple statement in a logical line, with a semicolon (` ; `) as a separator. However, it is recommended to use one simple statement per line to increase readability. Any expression can be on its own as a simple statement. 

An *assignment* is a simple statement that assigns values to variables. We use the ` = ` operator, and can never be part of an expression. In that case, the ` := ` (walrus) operator is needed.


In [1]:
# This is a simple statement
variable = 345.76j

__Compound Statements__

A compound statement contains one or more other statements and controls it's execution. These statements are also known as blocks. It has one or more *clauses*, aligned at the same indentation. Each clause has a *header* starting with a keyword and ending with a colon ( : ), followed by a *body*, which is a sequence of one or more statements, are on separate logical lines after the header line, indented four spaces rightward.

In [2]:
# This is a compound statement 
for i in range(10):
    print("This is the ", i, "iteration of the loop")

This is the  0 iteration of the loop
This is the  1 iteration of the loop
This is the  2 iteration of the loop
This is the  3 iteration of the loop
This is the  4 iteration of the loop
This is the  5 iteration of the loop
This is the  6 iteration of the loop
This is the  7 iteration of the loop
This is the  8 iteration of the loop
This is the  9 iteration of the loop


## Data Types



## Python Ecosystem
_Pandas:_ For data analysis.
_Matplotlib:_ foundational library for visualization. <br>
_Numpy:_ The numeric library that serves as the fundation of all calculation in Python. <br>
_Seaborn:_ A statistical visualization tool built on top of matplotlib. <br>
_Statsmodel:_ A library with many advanced statistical functions. <br>
_Scipy:_ Advanced scientific computing, including functions for optimization, linear algebra, image processing and more. <br>
_Scikit-Learn:_ The most popular machine learning library for python (not deep learning) <br>

Among many other tools for specific use-cases.