# Introduction to Python

In this workshop, we will use the Python programming language for some hands-on demonstration of social media data collection and analysis. Knowledge of python is not required as you are not expected to write any code. Rather, we hope to show a glimpse of the options available. We chose python because it has a broad community support, several libraries and packages, and is widely used for data science. But many of the options we show can also be used with R or other languages also. If you have any questions about the preperation, feel free to email indira.sen@gesis.org

## Setting up Python

Make yourself acquainted with the notebook environment. It's basically a webpage with executable code. Code is run by clicking the "run cell" button (eighth from the left).

Optional: There are many great keyboard shortcuts. Press 'H' to see a cheat sheet.

A good introduction is [this video right here](https://www.youtube.com/watch?v=HW29067qVWk). 

For this workshop, you don't need to write your own code. We will give notebook swhich you can run in a manner similar to this prep notebook.

### Output

Jupyter Notebook outputs the output of last line:

In [None]:
1+1

In [None]:
1+1
"Hello!"

Using 'print' explictly allows you multiple print-outs per cell. Output by using Python's print function:

In [None]:
print(1+1)
print("Hello!")

### Imports

A central building block of Python, and especially the distribution of Anaconda you should have installed, is the ability to import additional modules, packages or libraries into your current script with the 'import' command. 

One of the advantages is the great library support and you have many different packages available. And we will using many of them throughout the workshop. 

<what can you do with the libraries?>

In [None]:
import math

In [None]:
math.log(4)

In [None]:
math.cos(math.pi)

Sometimes you will want to use a short name for a library:

In [None]:
import math as mt

In [None]:
mt.log(4)

Note that you have to type the module name ("math" or "mt") before each function call.

You can also import a specific function of a module. Then the explicit call is not necessary:

In [None]:
from statistics import mean

In [None]:
mean([2, 5, 6, 100])

Now that we know the basics of importing, make yourself comfortable with using multiple libraries. NumPy, Pandas, and NetworkX are only three of the ones we will be using in the course.

However, in our introductory tutorials on Python fundamentals, we will use only basic functions of Python.

### NumPy

NumPy is the fundamental package for scientific computing with Python. More information and tutorials at:

http://www.numpy.org/

In [None]:
import numpy as np

An example command:

In [None]:
x = [2, 5, 6, 100]
np.mean(x)

### Pandas

Pandas provides data structures and data analysis tools. More information and tutorials at:

http://pandas.pydata.org/

Import the pandas module as alias pd:

In [None]:
import pandas as pd

An example command:

In [None]:
s = pd.Series([1, 3, 5, np.nan, 6, 8])
s

## Getting started with Python

### "Hello World"

Lets say you wanted to print "Hello Python World" how would you do it? One simple option is to utilize python notebook's feature of printing out the output of the last line.

In [None]:
"Hello Python World"

Or we could instead use the print statement explicitly.

In [None]:
print("Hello Python World")

Or we could make use of a variable that can hold our message.

In [None]:
message = "Hello Python world!"
print(message)

A variable holds a value. You can change the value of a variable at any point ('reassign' the variable).

In [None]:
message = "Hello Python world!"
print(message)

message = "Python is my favorite language!"
print(message)

## Optional

Here are some option materials for python anad data science. You are not expected to prepare them for the workshop but if you are interested, check them out.

## Materials for Learning Python

Via this link <a href='https://notebooks.gesis.org/binder/v2/gh/gesiscss/PythonDataScienceHandbook/master'>here</a> you can open the Python Data Science Handbook project, and can work through this complete data science text book: **VanderPlas, J. (2016): *Python Data Science Handbook: Essential Tools for Working with Data*. O'Reilly Media.** When in the Jupyter Notebook environment, navigate into the "notebooks" folder and execute the "index.ipynb" file.

A fine introduction for newcomers with a focus on data handling is: **McKinney, W. (2012): *Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython*. O'Reilly Media.** This book is not as deep on the data analysis we will be dealing with as the book by VanderPlas.

This is a data science textbook from the perspective of the social sciences: **Foster, I. , Ghani, R., Jarmin, R.S., Kreuter, F., and Lane, J. (eds) (2016): *Big Data and Social Science: A Practical Guide to Methods and Tools*. Chapman and Hall/CRC Press.**

Finally, more basic tutorials can be found <a href='https://github.com/jupyter/jupyter/wiki/A-gallery-of-interesting-Jupyter-Notebooks#introductory-tutorials'>here</a>.

## Additional Resources for Data Analysis 

An example machine learning notebook: https://github.com/rhiever/Data-Analysis-and-Machine-Learning-Projects/blob/master/example-data-science-notebook/Example%20Machine%20Learning%20Notebook.ipynb

Statistics visualisations with Java Script: http://students.brown.edu/seeing-theory/

Coursera, e.g.: https://www.coursera.org/browse/data-science

Berthold, M. and Hand, D. J. (eds.) (2002): *Intelligent Data Analysis: An Introduction*. Springer.

Bishop, C. (2006): *Pattern Recognition and Machine Learning*. Springer.

Ester, M. and Sander, J. (2000): *Knowledge Discovery in Databases: Techniken und Anwendungen*. Springer. **Deutschsprachig**.

Hastie, T., Tibshirani, R., and Friedman, J. (2001): *The Elements of Statistical Learning*. Springer.

Han, J. and Kamber, M. (2011): *Data Mining: Concepts and Techniques*. Morgan Kaufmann Publishers.

Mitchell, T. M. (1997): *Machine Learning*. McGraw-Hill.

Witten, I. H. and Frank, E. (2005): *Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations*. Morgan Kaufmann Publishers.