# Learning python - Basic Concepts and Syntax

## Outline:

- How to use python
- Change from R to python
- Variables
- Data types
- Basic operations

## A cheat sheet

https://www.codecademy.com/learn/learn-python-3/modules/learn-python3-hello-world/cheatsheet

## How do we run python code?

Three main ways to run python code:

- In a python interpreter by typing commands directly after starting python with the `python` command on the command line
- In a python script by typing commands into a text file and then running the script with the `python` command on the command line
- In a Jupyter notebook by typing commands into a cell and then running the cell

#### Python interpreter

Type `python` into the terminal and hit enter. You should see something like this:

`>>> `

This interface lets you run python code directly. Try typing `print("Hello world!")` and hitting enter. You should see the text `Hello world!` printed to the screen.

This is usually used to try out python code interactively.

Type quit() and hit enter to exit the python interpreter.

#### Python script

Now lets try running a python script. At the top of VSCode, double click in the open space next to the welcome tab.

Check out `demo_script.py`.

You can run this with `python demo_script.py`.

It will execute any code inside your script.

#### Jupyter notebook

In the top left of VSCode, click `File` -> `New File`. Then pick Jupyter Notebook to make a new notebook.

You'll likely see a prompt asking for a "python kernal". Pick the python environment John helped set up last time. This is the version of python we're using.

# Comparing R to Python

## No braces! Only tabs!

Python uses indentation (tabs or spaces) to define blocks of code instead of braces `{}` like in R. This means that the structure of your code is determined by how you indent it.

## Functional vs Object-Oriented Programming

- **R**: R is primarily a functional programming language. This means that functions are the primary way to manipulate data and perform operations. In R, you often pass data to functions, which return new data as output.
  - Example: `mean(c(1, 2, 3))` calculates the mean of a vector.

- **Python**: Python is an object-oriented programming language. While it supports functional programming, Python emphasizes the use of objects and methods. Objects are instances of classes, and methods are functions that belong to these objects.
  - Example: In Python, you might use a method like `my_list.append(4)` to modify a list.

## What is object oriented programming?

Object-oriented programming (OOP) is a programming paradigm that uses "objects" to represent data and methods to manipulate that data. In OOP, you define classes, which are blueprints for creating objects. Each object can have its own attributes (data) and methods (functions) that operate on that data. You (mostly) won't have general functions that operate on lots of different data types; instead, each data type (class) will have its own methods. One tricky bit about learning to work with OOP is learning the methods associated with the classes you want to use.

Methods are part of the "interface" of the object. They are functions that directly use, affect, and make use of the information in an object but are defined by the class.

If `Pet` objects defines a `speak()` method `dog.speak() == "woof"` while `cat.speak() == "meow"`.

`dir({object})` will return all of the fields and methods of an object. Fields and methods with `__{name}__` are "private" and are generally not intended to be part of the interface.

### - Key point -

If you come from an R background, this will require a major change in thinking about how to code!

In R, we do sum(x, y) to add x and y. In Python, we do x.sum(y) to add x and y. The operations you're going to want to use are often attached to objects!

Most of the time, when doing things in python, you're going to be making objects and then using `object.do_the_thing()` to perform your analyses. You're going to use functions to do things much less often. If you want to see what methods an object has available, you can use `dir(object)` to see them all or type the name of the object, then a period and hit tab a few times to see what methods are available in your IDE.

### Some brief definitions

- **Class**: A blueprint for creating objects. Defines a set of attributes and methods that the created objects will have.
- **Object**: An variable of a class. Contains data (attributes) and can perform actions (methods).
- **Method**: A function that is associated with an object. Defined within a class and can access the object's attributes.

## Methods vs Functions

- In R, most operations are performed using standalone functions. For example, you might use the same function, `summary()` to summarize a dataframe, vector, or something else.
- In Python, many operations are performed using methods, which are functions tied to specific objects. For example, instead of a standalone function, you might use `dataframe.describe()` to summarize a dataset.
- This reflects Python's object-oriented nature, where objects encapsulate both data and behavior.
- Remember that each object has its own set of methods that are relevant to that object's data type.

## Common python functions
- `len()` - Returns the length of an object (e.g., list, string, dictionary).
- `type()` - Returns the type of an object.
- `print()` - Outputs text or variables to the console.
- `dir()` - Returns a list of attributes and methods of an object.
- `range()` - Generates a sequence of numbers, often used in loops.
- `enumerate()` - Adds a counter to an iterable and returns it as an enumerate object.
- `zip()` - Combines multiple iterables (e.g., lists, tuples) into tuples.
- `map()` - Applies a function to all items in an iterable (e.g., list) and returns a map object.
- `filter()` - Filters items in an iterable based on a function that returns True or False.

- *args and **kwargs for functions
    - `*args` allows a function to accept any number of **positional arguments**
    - `**kwargs` allows a function to accept any number of **keyword arguments**

## Object.Method() Syntax

- Python's object-oriented design encourages the use of the `object.method()` syntax. This means that operations are often performed directly on objects using methods.
  - Example: Instead of using a function like `sort(vector)` in R, Python uses `list.sort()` to sort a list in place.

## In-Place Modification vs Assignment

- **R**: In R, most operations return a new object rather than modifying the original object. This means you often need to assign the result back to the variable to update it.
  - Example: `x <- sort(x)` creates a new sorted vector and assigns it back to `x`.

- **Python**: Python allows both in-place modification and assignment, depending on the method or operation used.
  - In-place modification: `my_list.append(4)` modifies the list variable directly.
  - By assignment: `my_list2 = sorted(my_list)` creates a new sorted list and assigns it back to `my_list2`. The original `my_list` remains unchanged.

## !!! **Indexes** !!!
R uses 1-based indexing, meaning that the first element of a vector or list is accessed with index 1. Python uses 0-based indexing, meaning that the first element is accessed with index 0. This difference can lead to off-by-one errors when translating code between the two languages.
  - [1, 2, 3][1] gives the number 2 in python

## Python uses some bare words for comparisons
and, or, not, gt, lt, eq....

# Variable assignment

In python, we assign a value to a variable using the `=` operator. The variable name goes on the left, and the value goes on the right. We don't use `<-` like in R.

In [1]:
my_name = 'Matt'

my_name

'Matt'

# Basic Data Types in Python

## Numeric Types

### Integer
An integer is a whole number, positive or negative. When declaring an integer you don't put it in quotes. For example:

In [2]:
x = 10

### Float
A float is a number that has a decimal point. You also don't put it in quotes. For example:

In [3]:
y = 20.5

### Float math vs int math

With floats, you can do math the way you're used to. The result will be a float.

In [4]:
first_num = 10.0
second_num = 3.3333

print(first_num + second_num)
print(first_num / second_num)

13.3333
3.000030000300003


With integers, you're doing math with whole numbers. The result will be an integer. The result will always be rounded down.

In [5]:
first_int = 7
second_int = 2

print(first_int // second_int) # integer division
print(first_int / second_int) # this one gets converted to float

3
3.5


### Boolean

A boolean is a variable that can only be True or False. You don't put it in quotes. Note that unlike R, the first letter is capitalized.

In [6]:
matt_is_awesome = True
print(matt_is_awesome)

sardines_are_awesome = False
print(sardines_are_awesome)

True
False


### String
A string is a sequence of characters. For example:

In [7]:
my_words = 'Hello, World!'
print(my_words)

Hello, World!


# String Formatting

use f"{value}" <- Preferred way

use "{0}".format(0th_value) method <- the old way

In [8]:
my_string = 'It works'
my_float = 100.00
my_integer = 30

print(f"{my_string} {my_float}% of the time, {my_integer}% of the time")
print("{0} {1}% of the time, {2}% of the time".format(my_string, my_float, my_integer))

It works 100.0% of the time, 30% of the time
It works 100.0% of the time, 30% of the time


# Basic variable types

## List
A list is a collection of items. You declare a list with square brackets.

You can refer to an item in a list by its index. **The index starts at 0.**

Three important things to note about lists:
- Lists are ordered. The order of the items in the list is important.
- Lists can contain any type of data, including other lists.
- Lists can be altered after you create it.

In [9]:
list_of_nums = [1, 2, 3, 4, 5]
list_of_stuff = ['bob', 1, 2.0, True]
print(list_of_stuff)

list_of_stuff.append(list_of_nums)
print(list_of_stuff)

list_of_stuff.append('matt')
print(list_of_stuff)

print(list_of_stuff[1])

['bob', 1, 2.0, True]
['bob', 1, 2.0, True, [1, 2, 3, 4, 5]]
['bob', 1, 2.0, True, [1, 2, 3, 4, 5], 'matt']
1


## Tuple
A tuple is a similar to a list in most regards, exept that it can't be changed.
You declare a tuple with parentheses.

Generally, unless you know you'll need to change it later, you should use a tuple instead of a list.


In [11]:
important_numbers = (1, 2, 3, 4, 5)
important_numbers[0] = 10 # This will fail

TypeError: 'tuple' object does not support item assignment

In [None]:
print(important_numbers)
print(important_numbers[0])

(1, 2, 3, 4, 5)
1


## Dictionary
A dictionary is a collection of key-value pairs.

You provide both a name, and a value in the format `key: value`.

You declare a dictionary with curly braces `{}`. For example:
```python
my_dict = {'key1': 'value1', 'key2': 'value2'}
```

You can retrieve a value from a dictionary by using the key, which is much faster than looping through a list. For example:
```python
print(my_dict['key1'])  # Outputs: value1
```

### `dict()` Constructor vs `{}`
- **Using `{}`**: This is the most common way to define a dictionary. It is concise and allows you to directly specify key-value pairs.
  ```python
  my_dict = {'a': 1, 'b': 2}
  ```

- **Using `dict()`**: This is a constructor method that can be used to create dictionaries. It is particularly useful when creating dictionaries from sequences or keyword arguments.
  ```python
  my_dict = dict(a=1, b=2)  # Equivalent to {'a': 1, 'b': 2}
  ```

  You can also use `dict()` to convert other data structures (like lists of tuples) into dictionaries:
  ```python
  my_dict = dict([('a', 1), ('b', 2)])
  ```

In [None]:
student_grades = {'John': 'A', 'Ali': 'B+', 'Matt': 'D'}

print(student_grades['Matt'])

student_grades['Dave Grohl'] = 'A+'

print(student_grades.get('Dave Grohl'))

dict_keys = student_grades.keys()
print(list(dict_keys))

D
A+
['John', 'Ali', 'Matt', 'Dave Grohl']


## Set
A set is a collection of unique items. Even if you add the same item twice, it will only be stored once.

You declare a set with curly braces.

In [None]:
set_of_states = {'CA', 'OR', 'WA', 'NV', 'AZ', 'UT', 'ID', 'MT', 'WY', 'CO', 'NM'}

print(set_of_states)

set_of_first_names = {'Matt', 'John', 'Ali', 'Matt'}
print(set_of_first_names)

{'NM', 'WA', 'CA', 'UT', 'MT', 'NV', 'AZ', 'WY', 'ID', 'OR', 'CO'}
{'Matt', 'John', 'Ali'}


# Comparison operators

Evaluate to either True or False

- `==` Equals
- `>`  Greater than
- `>=` Greater than or equal to
- `<`  Less than
- `<=` Less than or equal to
- `!=` Not equal to

# Arithmatic Operators #

- ` + ` addition for numbers, concatenation for strings
- ` += ` add the right hand side to the value already stored in the left hand side
- ` - ` subtraction
- ` * ` multiplication, repeating for strings
- ` @ ` reserved for matrix multiplication -- doesnt work on built in types
- ` / ` division -- will always return a float
- ` // ` floor division -- will return an integer if only integers are used, rounded down
- ` % ` modulo -- returns the integer remainder of division between integers
- ` ** ` exponetiation


If either member of an expression is a float the other member will be converted to a float before the operation.

# Comparison Keywords

- `and` -- returns True if both sides are True
- `or`  -- returns True if either or both sides are True
- `not` -- reverses the following True or False

In [None]:
True and True

True or False

True not False

# Identity Comparison

- `is`     -- returns True if the left and right objects are **literally the same object**
- `is not` -- returns True if the left and right objects are **separate objects**

# Membership Comparison

- `in`     -- returns True if left hand is a member of right hand collection
- `not in` -- returns True if left hand is not a member of right hand collection

# Loops
Note that the first line ends with a colon `:` and the next line is indented. All following indented lines are part of the loop until the indentation ends. Even if you have empty lines, lined that are indented afterwards are still part of the loop.

## For loop

In [21]:
my_tuple = (1, 2, 3, 4, 5)


for this_number in my_tuple:
    print(this_number)

    print(this_number * 2)

1
2
2
4
3
6
4
8
5
10


## List comprehension is basically a for loop in a single line

Supposed to be faster

In [4]:
squared_numbers = [x**2 for x in range(1, 6)]

print(squared_numbers)

[1, 4, 9, 16, 25]


## While loop

In [22]:
max_number = 4
number_var = 0
while number_var < max_number:
    print("not there yet!")
    number_var += 1

print("We made it! This is outside of the loop.")
print(f"Note that number_var is now {number_var} and still exists.")

not there yet!
not there yet!
not there yet!
not there yet!
We made it! This is outside of the loop.
Note that number_var is now 4 and still exists.


## If/else statements

In [None]:
bobs_mom = 'Trisha'
if bobs_mom != 'Betty':
    print('Bob gets no presents from Betty')
elif bobs_mom == 'Linda':
    print('Bob gets presents from Linda')
else:
    print('No presents from anyone since Trisha doesn\'t give presents')

# Opening a file to read
Doing it using `with` statement means that the file will be closed once we exit the block. The variable we made inside the block persists outside of the block.

In [10]:
# Get current directory
import os

current_directory = os.getcwd()
print("Current Directory:", current_directory)

with open("exercising.tsv", "r") as file:
    my_data = file.read()

print(my_data)

Current Directory: /gpfs0/home2/gdrobertslab/mvc002/bioinformatics_meeting/python_intro/session_2_datatypes_variables_concepts
Person	Month	Exercise	Count
Alice	January	Running	12
Alice	January	Cycling	8
Alice	January	Swimming	5
Alice	January	Yoga	10
Alice	January	Weightlifting	7
Alice	February	Running	15
Alice	February	Cycling	6
Alice	February	Swimming	9
Alice	February	Yoga	11
Alice	February	Weightlifting	4
Bob	March	Running	10
Bob	March	Cycling	12
Bob	March	Swimming	8
Bob	March	Yoga	7
Bob	March	Weightlifting	9
Charlie	April	Running	14
Charlie	April	Cycling	10
Charlie	April	Swimming	6
Charlie	April	Yoga	8
Charlie	April	Weightlifting	5
Diana	May	Running	9
Diana	May	Cycling	11
Diana	May	Swimming	7
Diana	May	Yoga	12
Diana	May	Weightlifting	10
Eve	June	Running	13
Eve	June	Cycling	9
Eve	June	Swimming	10
Eve	June	Yoga	6
Eve	June	Weightlifting	8


# Custom functions
- You can create your own functions using the `def` keyword.
- This lets you package up code to be reused later.
- This also limits variable scope to inside the function.

In [20]:
import random

def mess_up_my_numeric_data(data):
    messed_up = data * (42 + 3.14 + random.randint(1, 10))
    return messed_up

result = mess_up_my_numeric_data(10)
print(result)

501.4
