# Data Science for Manufacturing - Workshop 1-1

##  Objectives

- Introduction to the Jupyter Norebook environment.

- Explain what a library is and what libraries are used for.

- Basic data types in Python.

- Create a new variable in Python.

- Using functions.

- Loading data.

- Import a Python library and use the functions it contains.

- Read tabular data from a file into a program.



### The Notebook Interface

[Jupyter Notebook cheatsheet](https://www.datacamp.com/community/blog/jupyter-notebook-cheat-sheet)

### Essential shortcuts

- **Run cells**: `ctrl + Enter`
- **Run cells and select bellow**: `shift + Enter`
- **Run cells and add cell bellow**: `alt + Enter`
- **Change cell type to code**: `esc + Y`
- **Change cell type to markdown**: `esc + M`
- **Add cell bellow**: `esc + B`

### Cells


#### code cell

In [None]:
print('Hello World!')

#### Markdown cell

This is a Markdown cell  

[Markdown cheatsheet](https://www.markdownguide.org/cheat-sheet/)

### Kernels

The kernel is where the code is executed when you run a code cell. The output is returned back to the cell to be displayed. Kernel's state remains boyond individual cells. For example when you import a library that library is available throughout the document. 

In [None]:
import numpy as np

In [None]:
x = np.random.randint(1, 10)
print (x)

### Variables

Any Python interpreter can be used as a calculator:

In [None]:
3 + 5 * 4

This is great but not very interesting. To do anything useful with data, we need to assign its value to a variable. In Python, we can assign a value to a variable, using the equals sign =. For example, we can track the weight of a patient who weighs 60 kilograms by assigning the value 60 to a variable weight_kg:

In [None]:
weight_kg = 60

From now on, whenever we use weight_kg, Python will substitute the value we assigned to it. In layman’s terms, a variable is a name for a value.

- Variables are names for values.
- In Python the = symbol assigns the value on the right to the name on the left.
- The variable is created when a value is assigned to it.

In Python, variable names:

- can only contain letters, digits, and underscore _ (typically used to separate words in long variable names)
- cannot start with a digit
- are case sensitive (age, Age and AGE are three different variables)

- Variable names that start with underscores like *__alistairs_real_age* have a special meaning so we won’t do that until we understand the convention.

This means that, for example:

- weight is a valid variable name, whereas 0weight is not
- weight and Weight are different variables


<span style="color:red">**Variables must be created before they are used.**</span>

### Types of data

Python knows various types of data. Three common ones are:

- integer numbers
- floating point numbers, and
- strings.

In the example above, variable weight_kg has an integer value of 60. If we want to more precisely track the weight of our patient, we can use a floating point value by executing:

In [None]:
weight_kg = 60.3

To create a string, we add single or double quotes around some text. To identify and track a patient throughout our study, we can assign each person a unique identifier by storing it in a string:

In [None]:
part_id = '001'

### Using Variables

Once we have data stored with variable names, we can make use of it in calculations. We may want to store our patient’s weight in pounds as well as kilograms:

In [None]:
weight_lb = 2.2 * weight_kg

We might decide to add a prefix to our patient identifier:

In [None]:
part_id = 'manu_' + part_id

<span style="color:red">**Python is case-sensitive.**</span>

- Python thinks that upper- and lower-case letters are different, so Name and name are different variables.
- There are conventions for using upper-case letters at the start of variable names so we will use lower-case letters for now.


### Built-in Python functions

To carry out common tasks with data and variables in Python, the language provides us with several built-in functions. To display information to the screen, we use the print function. 

- Call the function (i.e., tell Python to run it) by using its name.
- Provide values to the function (i.e., the things to print) in parentheses.
- To add a string to the printout, wrap the string in single or double quotes.
- The values passed to the function are called arguments

In [None]:
print(weight_lb)
print(part_id)

When we want to make use of a function, referred to as calling the function, we follow its name by parentheses. The parentheses are important: if you leave them off, the function doesn’t actually run! Sometimes you will include values or variables inside the parentheses for the function to use. In the case of print, we use the parentheses to tell the function what value we want to display. We will learn more about how functions work and how to create our own in later episodes.

We can display multiple things at once using only one print call:

In [None]:
print(part_id, 'weight in kilograms:', weight_kg)

We can also call a function inside of another function call. For example, Python has a built-in function called type that tells you a value’s data type:

In [None]:
print(type(60.3))
print(type(part_id))

Moreover, we can do arithmetic with variables right inside the print function:

In [None]:
print('weight in pounds:', 2.2 * weight_kg)

The above command, however, did not change the value of weight_kg:

In [None]:
print(weight_kg)

To change the value of the weight_kg variable, we have to assign weight_kg a new value using the equals = sign:

In [None]:
weight_kg = 65.0
print('weight in kilograms is now:', weight_kg)

A variable in Python is analogous to a sticky note with a name written on it: assigning a value to a variable is like putting that sticky note on a particular value.

![Value of 65.0 with weight_kg label stuck on it](https://swcarpentry.github.io/python-novice-inflammation/fig/python-sticky-note-variables-01.svg)

Using this analogy, we can investigate how assigning a value to one variable does not change values of other, seemingly related, variables. For example, let’s store the subject’s weight in pounds in its own variable:

In [None]:
# There are 2.2 pounds per kilogram
weight_lb = 2.2 * weight_kg
print('weight in kilograms:', weight_kg, 'and in pounds:', weight_lb)

![Value of 65.0 with weight_kg label and value of 143 with pounds label stuck on it](https://swcarpentry.github.io/python-novice-inflammation/fig/python-sticky-note-variables-02.svg)

Similar to above, the expression 2.2 * weight_kg is evaluated to 143.0, and then this value is assigned to the variable weight_lb (i.e. the sticky note weight_lb is placed on 143.0). At this point, each variable is “stuck” to completely distinct and unrelated values.

Let’s now change weight_kg:

In [None]:
weight_kg = 100.0
print('weight in kilograms is now:', weight_kg, 'and weight in pounds is still:', weight_lb)


![Value of 100.0 with label weight_kg stuck on it, and value of 143.0 with label weight_lb stuck on it](https://swcarpentry.github.io/python-novice-inflammation/fig/python-sticky-note-variables-03.svg)

Since weight_lb doesn’t “remember” where its value comes from, it is not updated when we change weight_kg.

### Libraries

[Common libraries cheatsheet](https://www.python-graph-gallery.com/cheat-sheets/)

[Pandas cheatsheet](https://pandas.pydata.org/Pandas_Cheat_Sheet.pdf)

The term "library" is used to describe a code library, a file usually containing a set of functions or precompiled codes that can be used later on in a program for some specific well-defined operations.

- A library is a collection of files (called modules) that contains functions for use by other programs.
  - May also contain data values (e.g., numerical constants) and other things.
  - Library’s contents are supposed to be related, but there’s no way to enforce that.
- The Python standard library is an extensive suite of modules that comes with Python itself.
- Many additional libraries are available from PyPI (the Python Package Index).
- We will see later how to write new libraries.

Libraries provide additional functionality to the basic Python package, much like a new piece of equipment adds functionality to a lab space. Just like in the lab, importing too many libraries can sometimes complicate and slow down your programs - so we only import what we need for each program.

Once we’ve imported the library, we can ask the library to read our data file for us.

<span style="color:red">**A program must import a library module before using it**</span>

#### Pandas

What it does: Provides access to efficient data structures for structured and time-series data. Pandas is a widely-used Python library for statistics, particularly on tabular data. Borrows many features from R’s dataframes.

A brief rundown of the features offered by Pandas include:

- An efficient DataFrame object for data manipulation

- Easy reshaping and pivoting of data sets

- Merging and joining of data sets

- Label-based data slicing, indexing, and subsetting

- Allows working with time-series data

- And other crucial tools for reading and writing data into multiple formats, even between in-memory data structures (Source: towards data science)

#### Seaborn

What it does: Seaborn is a library for making statistical graphics in Python. It builds on top of matplotlib and integrates closely with pandas data structures.

Seaborn helps you explore and understand your data. Its plotting functions operate on dataframes and arrays containing whole datasets and internally perform the necessary semantic mapping and statistical aggregation to produce informative plots. Its dataset-oriented, declarative API lets you focus on what the different elements of your plots mean, rather than on the details of how to draw them. (Source: seaborn)

#### Numpy

What it does: Provides access to N-dimensional arrays and other useful numerical tools.

The two vital benefits that NumPy has to offer is the support for powerful N-dimensional array objects and built-in tools for performing intensive mathematical as well as scientific calculations. (Source: towards data science)

#### PyTorch

What it does: Provides tools and libraries for developing GPU-powered Machine Learning applications

PyTorch is being used more commonly to research, develop, and deploy applications that leverage advanced technologies like Computer Vision and Natural Language Processing. If needed, PyTorch can also pair well with other powerful libraries like NumPy, SciPy, and Cython. (Source: towards data science)

#### Matplotlib

What it does: Helps developers create stunning visualizations. 
    
Matplotlib is one of the most popular visualization libraries for Python. 
Being used by hundreds of companies and individuals, matplotlib lets you visualize your data in several different ways. (Source: towards data science)


#### Scikit-learn

What it does: It is a famous Python library to work with complex data. Scikit-learn is an open-source library that supports machine learning. It supports variously supervised and unsupervised algorithms like linear regression, classification, clustering, etc. This library works in association with Numpy and SciPy. (Source: geeks for geeks)

##### Example

In [None]:
# Importing math library
import math
  
A = 16
print(math.sqrt(A))

<span style="color:red">**Use help to learn about the contents of a library module**</span>

In [None]:
help(math)

**Check the libraries installed**

In [None]:
! pip list

- Use *import ... as ...* to give a library a short alias while importing it.
- Then refer to items in the library using that shortened name.
- Commonly used for libraries that are frequently used or have long names.
    - E.g., the matplotlib plotting library is often aliased as mpl.
- But can make programs harder to understand, since readers must learn your program’s aliases.

### Loading data

To begin processing data, we need to load it into Python. We can do that using the library pandas.

- Load it with import pandas as pd. The alias pd is commonly used for Pandas.
- Read a Comma Separated Values (CSV) data file with pd.read_csv.
    - Argument is the name of the file to be read.
    - Assign result to a variable to store the data that was read.


In [None]:
import pandas as pd

In [None]:
df = pd.read_csv('bolts.csv')

### Investigating the data

In [None]:
df.head()

In [None]:
df.tail()

### Renaming columns

In [None]:
df.columns = ['ID', 'thread_l', 'thread_w', 'grip_l', 'head_l', 'type']

In [None]:
df.head()

### Check dataset

In [None]:
type(df)

In [None]:
df.info()

In [None]:
df.dtypes

In [None]:
df.shape

In [None]:
len(df)

### Bash commands in Jupyter Notebook

You can use the **%whos** command at any time to see what variables you have created and what modules you have loaded into the computer’s memory. As this is an IPython command, it will only work if you are in an IPython terminal or the Jupyter Notebook.

In [None]:
%whos

The **%pwd** command can be used to see the current directory in which this notebook was opeed and it stands for print walking directory.

In [None]:
%pwd

The **%ls** command gives you the list in your current directory.

In [None]:
%ls

In [None]:
%lsmagic


#### Key Points

- Basic data types in Python include integers, strings, and floating-point numbers.

- Use variable = value to assign a value to a variable in order to record it in memory.

- Variables are created on demand whenever a value is assigned to them.

- Use print(something) to display the value of something.

- Built-in functions are always available to use.

- Use variables to store values.

- Use print to display values.

- Variables persist between cells.

- Variables must be created before they are used.

- Variables can be used in calculations.

- Bash commands in Jupyter Notebook