# Python Packages

## Modules

Modules are files containing a set of functions you want to include in your application.
Modules are useful because they can store a handy function that you may need to use often into it. You can then call the function from the module whenever you need it.

In python, there is a way to put definitions in a file and use them in a script. These files are known as modules. Definitions and functions in a module can be imported into other modules.

To create a module, you would need to save a code into a file with the suffix `.py` appended.

A docstring is a string literal that usually appears as the first statement in a module. It is used to explain what the module does. The docstring becomes a `__doc__` special attribute of that object.

There are several built-in modules in python which can be imported and called at any time.

Example: 


```
import platform

x = platform.system()
print(x)
```

You can also use the `dir()` function to list all the defined names belonging to the platform module.

Example:


```
import platform

x = dir(platform)
print(x)
```

###  Import

Python modules can get access to code from another module by importing the file/function using import. When import is used, it searches for the module initially in the local scope by calling __import__() function. The values returned by the function are then reflected in the output of the initial code.

The basic import statement is executed in two steps. Firstly, it finds a module, loads and initializes it. Secondly, defines names in the local namespace for the scope where the import statement occurs.

Import statements should be carried out in separate lines

Examples:


```
Import os

Import sys
```

When the requested modules is retrieved successfully, there would be 3 main ways it would be made available in the local namespace.

Firstly, using `import` and defining the module. If the module imported is a top level module, its name would be bound to the local namespace as a reference to the imported module.

Example: 


```
Import stockdata
```

Secondly, you can create an alias when you import a module by using `as` keyword.

Example: 


```
Import stockdata as SD

```

Thirdly, if the module imported is not a top level module, the name of the top level package that contains the module is bound in the local namespace. The imported module must be accessed using its full qualified name.

Example: 


import pandas.dataframe


We can also use `from` together with the `import` statement.

Imports the module, and creates references to all public objects defined by that module in the current namespace or whatever name you mentioned.
After you've run this statement, you can simply use a plain `()` name to refer to things defined in the `module(x)`. But `X` itself is not defined, so `X.name` doesn't work. 
 


## Packages

There are two types of packages in Python 

1. Regular packages

    Regular packages are traditional packages as they existed in Python 3.2 and earlier. A regular package is typically implemented as a directory containing an `__init__.py` file.
When a regular package is imported, this `__init__.py` file is implicitly executed, and the objects it defines are bound to names in the package’s namespace.


2. Namespace packages 

	With namespace packages, there is no `parent/__init__.py` file.  There may be multiple parent directories found during import search, where each one is provided by a different portion. 
    
A namespace package is a composite of various portions, where each portion contributes a subpackage to the parent package. Portions may reside in different locations on the file system.  


### Searching for package

To begin the search, Python needs the fully qualified name of the module (or package, but for the purposes of this discussion, the difference is immaterial) being imported. This name may come from various arguments to the import statement, or from the parameters to the `importlib.import_module()` or `__import__()` functions. This name will be used in various phases of the import search, and it may be the dotted path to a submodule, e.g. `foo.bar.baz`. In this case, Python first tries to import `foo`, then `foo.bar`, and finally `foo.bar.baz`. If any of the intermediate imports fail, a `ModuleNotFoundError` is raised.



## Pandas and dataframe

In computer programming, Pandas is a software library written for the Python programming language for data manipulation and analysis. In particular, it offers data structures and operations for manipulating numerical tables and time series. 

Dataframe is a 2-dimensional labelled data structure with columns of potentially different types. There are 3 components in a dataframe: rows, columns and data. Also, dataframe is generally the most commonly used pandas object. 

Let us start with an example:


```
Import pandas as pd 
df = pd.DataFrame ({"a" : [4 ,5, 6], "b" : [7, 8, 9], "c" : [10, 11, 12]}, 
index = [1, 2, 3])   
```

which would give us:
```
   a  b   c
1  4  7  10
2  5  8  11
3  6  9  12
```

To start, we have to import Pandas using `import pandas as pd` first, which imports Pandas and assign it to a declared name by developer called `pd`.

Here we define the dataframe by putting in the data directly. However, a more common way to install dataframe is to read from other files, such as .csv file. If we want to use a .csv file, type 


```
pd.read_csv(filepath)
```

For a complete list for importing different file types, check https://pandas.pydata.org/pandas-docs/stable/reference/io.html.

After importing an entire dataframe, pandas can extract particular sets of data such as:
```
df.head(2) 
```
which gives us:
```
   a  b   c
1  4  7  10
2  5  8  11
```

This will return only first 2 rows of data and there are similar functions like `df.tail(n)` which will return only last 2 rows of data. 

other functions like `df.loc` allows you to extract data based on specific search terms. For example:

```
df.loc[1] 
```
gives us:
```
a     4
b     7
c    10
Name: 1, dtype: int64
```

`df.loc` will allow you to get rows/columns with particular labels from the index. As our example shows, `df.loc[1]` returns the row named '1' as well as its name and data type.


Using `[[]]` will return another dataframe that has all the data for both '1' as well as '2' such as:
```
df.loc[[1,2]]
```
which gives us:
```
	a	b	c
1	4	7	10
2	5	8	11
```



On the other hand, `df.iloc[]` searches for data based on its position. The 'i' in `iloc` refers to integer. For example:
```
df.iloc[1]
```
gives us:
```
a     5
b     8
c    11
Name: 2, dtype: int64
```

`df.iloc` may look similar to `df.loc` but their functions are different. When we type `df.loc[1]` we are finding the row named '1', while if we type `df.iloc[1]` it will return the row in position 1. Remember, in python indexing starts from 0, hence the row in position 1 is actually the second row in the dataframe (`df.iloc[0]`
will return the first row).


You can also manipulate and append the dataframe using functions such as:
```
df['d'] = df['a']+df['b']  

```
which gives us:
```
   a  b   c   d
1  4  7  10  11
2  5  8  11  13
3  6  9  12  15
```

This creates a new column 'd' in the dataframe that is the sum of column 'a' and 'b'. 

Other than that, pandas can also generate useful summaries of the data. For example:


```
df.describe()
```
gives us:
```
	a	b	c
count	3.0	3.0	3.0
mean	5.0	8.0	11.0
std	1.0	1.0	1.0
min	4.0	7.0	10.0
25%	4.5	7.5	10.5
50%	5.0	8.0	11.0
75%	5.5	8.5	11.5
max	6.0	9.0	12.0
```

This will show the summary statistics for the numeral columns. For more specific statistics, you can use functions like `df.mean()`, which describes the means of all columns and `df.corr()`, which returns the correlation between columns in a dataframe.

```
df.to_csv(filename)
```

This will allow you to export the data you have into a .csv file and there are similar functions like df.to_excel (convert data into excel file) 

## Numpy

NumPy(Numerical Python) is the fundamental package for scientific computing in Python. It is a Python library that provides a multidimensional array object, various derived objects, and an assortment of routines for fast operations on arrays, including mathematical, logical, shape manipulation, sorting, selecting, I/O, discrete Fourier transforms, basic linear algebra, basic statistical operations, random simulation and much more. 

The ndarray object is the core of the NumPy package. This encapsulates n-dimensional arrays of homogeneous data types, with many operations being performed in compiled code for performance. There are several important differences between NumPy arrays and the standard Python sequences:


- NumPy arrays have a fixed size at creation which is different from Python lists. To change the size of an ndarray, a new array needs to be created and the original has to be deleted.
       
       
- The elements in a NumPy array are all required to be of the same data type. The exception: one can have arrays of objects.
    
    
- NumPy arrays facilitate advanced mathematical and other types of operations on large numbers of data. Typically, such operations are executed more efficiently and with less code than is possible using Python’s built-in sequences.


- A growing plethora of scientific and mathematical Python-based packages are using NumPy arrays; though these typically support Python-sequence input, they convert such input to NumPy arrays prior to processing, and they often output NumPy arrays. In other words, in order to efficiently use much (perhaps even most) of today’s scientific/mathematical Python-based software, just knowing how to use Python’s built-in sequence types is insufficient - one also needs to know how to use NumPy arrays.


In order to use NumPy, we have to import numpy using `import numpy as np` which imports numpy and assign it to a declared name by developer called `np`. Developer have the freedom to name the declared name, but it is highly recommended to use `np`) To initialize NumPy, we can do it like the following example:

```
Import numpy as np
Index = np.array([0, 1, 2, 3, 4]) #converting array [0, 1, 2, 3, 4] to a numpy array
Index2 = np.array(1) #converting ndarray with the value 1
Index3 = np.array([[0, 1, 2, 3, 4],[0, 1, 2]])
#converting 2d array [[0, 1, 2, 3, 4],[0, 1, 2]] to a numpy array

```

As previously mentioned, ndarray requires all the elements to be the same type, therefore if a list of mixture of integers and float will be converted to a list of float. 

```
Index4 = np.array([0,1,2,3.0] 
#all elements will be converted to float
print(index4) 

```

### Numpy round

The numpy.round is an inbuilt function in NumPy that is used to round off every single element in the numpy array(ndarray). The `numpy.round` will take in 2 arguments: the first argument will be the numpy array and the second argument will be the decimal place to round up to. The following example will demonstrate the usage and syntax of numpy.rounds

```
Import numpy as np
Index = np.array([0.5, 1.35, 2.68, 3.10, 4.9000])
#converting array [0.5, 1.35, 2.68, 3.10, 4.9000] to a numpy array
np.round(data, 2) #numpy will round up every element in the numpy array to two decimal places 

```

### Numpy NaN

The numpy.nan is a floating point representation(float) of Not a Number (NaN). It is similar to Python’s `none`.
NaN can be assigned to an index in a numpy array and dataframes. The following example will demonstrate it:


```

Import numpy as np
index = np.array(1) #creating a ndarray with the value 1
Index = np.nan #assign np.nan to index 
print(index)

index2 = np.array([1.0, 2.0 ,3.0]) 
#creating ndarray with the array[1.0, 2.0, 3.0]. At least one of the element needs to be float
Index2[0] = np.nan #assign index2[0] as np.nan
print(index)
```


## Datetime

`datetime` is a module that provides classes for manipulating dates and times. In Python, date alone is not a data type of its own, but if we use `datetime` to work with dates as date objects. 

To use `datetime`, we have to import it first with the following statement. This will import the classes in date time into the Python file. 


```
from datetime import datetime
```
OR
```
Import datetime
```

### Strptime

strptime is a function in datetime that creates a datetime object from the given string. Do note that not every string can create a `datetime` object, it needs to be in a certain format. strptime() class function contains two parameters: a string (to be converted to `datetime`) and format code. It will raise a `ValueError` exception if both arguments do not match. For Example:


```
From datetime import datetime 
dateString = “12/11/2018 09:15:32”
dateObj = datetime.strptime(dateString, “%d/%m/%Y %H:%M:%S”)
print(dateObj)
```
gives us:
```
2018-11-12 09:15:32
```

For more information on the format codes that can be used, check here:https://docs.python.org/3/library/datetime.html

## Matplotlib

Matplotlib is an open-sourced, low level graph plotting library in python and it allows us to visualise data.  Similarly, we need to import matplotlib at start:

```
import matplotlib
```

### pyplot

pyplot submodule is mostly used in Matplotlib and we also need to import it first.

```
import matplotlib.pyplot as plt #R
import numpy as np

xpoints = np.array([3, 21])
ypoints = np.array([8, 56])

plt.plot(xpoints, ypoints) 
plt.show()
```
will plot the following graph:

![matploltlib-plot](asset/img/matplotlib-plot.png)

The plot(x,y) function allows us to draws a line from point to point. Parameter 1 is an array containing the points on the x-axis, which is horizontal, and parameter 2 is an array containing the points on the y-axis, which is vertical. plot(y) is also possible and here x is an index array which start from 0 to N-1. 

## Date

Beyond python, Matplotlib provides sophisticated date plotting capabilities. Start with importing matplotlib.dates as mdates.

```
import matplotlib.dates as mdates
```

As the datetime objects are different in python and matplotlib, date2num function allows us to convert datetime objects to Matplotlib dates and num2date provides the opposite, which coverts Matplotlib dates to datetime objects.

Also, syntax like DateFormatter(fmt,tz) use strftime format strings. Here, fmt means a strftime format string and is always required; tz means the timezone and can be set to none to ignore the timezone information.
