# Overview

organized by *Paul Squires and Shannon Tubridy*

thanks to *Todd Gureckis* for providing open licensed materials


- This notebook begins to introduce Python.
    - Introducing letters and strings of letters.
    - Working with strings: indexing, slicing, text manipulation functions.
    - Introducing numbers: integers and floats.
    - basic math operations



## About Jupyter Notebooks

For a reminder about interacting with Jupyter notebooks, check the the Jupyter Intro notebook.

<div class="alert alert-info">
<b>Putting text and comments in code cells</b>, If you wish to insert a code example in a code cell but not actually run the code when the code cell is run, preceed the lines of code with one or more pound signs (#).

Similarly, you will occasionally see 'comments' in the code blocks that explain what's happening. These can be made by prefixing the comment text with the pound sign.
    
The two code cells below this show examples of doing this.
</div>

In [None]:
# s = "This is some python code that will not be run."
# print(s)

In [None]:
# define x and y and then see if they're equal:
x = 3
y = 4
z = 7
x == y

<div class="alert alert-info">
<b>Two usages of the # symbol</b> You have now seen that in Markdown cells the # symbol controls whether text should be treated as a header (making it large and bold) and in code cells the same symbol simply marks some text to not be executed as code.
</div>

# DATA TYPES - letters, strings, and numbers

A data type is a way of representing particular kinds of information like numbers, letters, and so on.

The next set of cells will introduce assigning different kinds of data to variables and working with those variables.

**Setting the Data Type**
 - The data/data structure type is set when you assign a value to a variable.
 - letters and words are stored as __strings__
 - numbers can be stored as __int__'s, __float__'s, and __complex__
    - int: an integer
    - float: a floating point number (numbers with decimal precision)
    - complex: numbers with a real and imaginary component


In [None]:
# Examples of defining some strings:
xstr = "Hello World"
ystr = 'a'
zstr = '1+3'


# And some numbers

# an integer, or whole number:
xint = 20

# a float, or number with floating point precision enabling 
# decimals:
xflo = 20.5

# assign the result of a mathematical operation to a variable:
zint = 1+3



### Strings

A string is a letter or ordered set of letters that can be treated as text. Python will treat any keystrokes surrounded by single ('hello', 'goodbye') or double ("hello", "goodbye") quotes as a string.

### Numbers

If a variable is declared or assigned a value that is a number without quotes around it, Python will infer the datatype int or float according to the presence of a decimal.

In [None]:
### the print() function can be used to "print" the contents of a variable
# to the cell output
# at minimum it takes a single input or argument: the thing you want to see

zstr = '1+3'
zint = 1+3

print(zstr)
print(zint)

In [None]:
print(xstr)
print(ystr)

In [None]:
# if an already existing variable is the last line in a cell
# you will see what's stored inside
xint

In [None]:
# jupyter only gives us the last entry as output if we don't use print
# in this case we will only get output for xint and not for xstr
xstr
xint

In [None]:
# use print() instead:
print(xstr)
print(xint)

In [None]:
# sometimes you need to check the the data type of a variable 
# separate from its actual value
# use the type() function with your variable as input:
type(xflo)

In [None]:
type(ystr)

**Setting the Specific Data Type:**
 - If you want to explicitly specify or convert the data type, you can do like in the following cell where we use the str(), int(), and float() functions to convert inputs to the new type

**Explicitly setting the data type can be useful for things like turning a number into a string for use in a filename or turning an integer into a floating point number (we'll talk more bout this last point later).**


In [None]:
# treat a number like a string or text:
xstr = str(3.14)
print("xstr:")
print(xstr)

# convert a decimal to a float
xint = int(20.5)
print("xint:")
print(xint)

# convert an integer to a float
# this is sometimes necessary if you are doing a math
# operation that requires maintaining decimal precision
# in the result
xflo = float(20)
print("xflo:")
print(xflo)

<div class="alert alert-info">

<b>Note the use of str(), int(), and float() in the previous code cell</b>. Those are each examples of <b>functions</b> which are pre-defined blocks of code with a name.

Functions take inputs called parameters or arguments, execute some code using the inputs, and then usually give the result. Sometimes we will say that we *pass* an argument or parameter to a function, and sometimes we will say that we *call* a function when we use it.

You have already seen the print() function used to display the contents of a variable.
</div>

### Exercise

Define a variable to be a string of your choice. Choose a simple name for your variable set it equal to some string. Then follow the previous example to convert it to a float. What happens? Are you able to find any useful message in the output?

### Keeping track of variables

After a while it can be hard to keep track of all the names and contents of your currently active variables.

To check a single variable you can do `print(variable_name)` and/or `type(variable_name)` to get the contents and/or type of some variable.

To see all your currently defined variables at once you can use the `%whos` command. The percent sign in front of whos is because this is a special command (a "magic" command...  https://ipython.readthedocs.io/en/stable/interactive/magics.html)

In [None]:
# show all the variables currently available to the kernel
%whos

#  String Indexing
#### Indexing allows you to separately access individual elements of a string


A string is an ordered letter or set of letters (or symbols) that will be treated like text. It can be a single element, like one letter or symbol, or a sequence of letters. We can "declare" or define a particular string by enclosing characters in quotes.

In [None]:
str1 = 'a'
str2 = '&'
str3 = '3+4'

print(str1)
print(str2)
print(str3)

In [None]:
print(type(str1))

#### Using the output of one function as the input to another

In the previous cell you'll see `print(type(str1))`.

The code is evaluated inside out, so that the result of using the `type()` function with input `str1` gets immediately used as the input to the print() function.

### Strings have "lengths" and we can access individual elements

In [None]:
## Create a string variable called months whose value is september  
# Print the contents of the variable.
months = "september"
other_months = 'november'

#### The string length corresponds to the number of individual letters or characters in the string

Use the len() function to get the length of a string

In [None]:
print(months)
len(months)

spaces and other characters can be included in strings and count towards the length

In [None]:
title = 'War and Peace'
len(title)

In [None]:
email_address = 'me@my.com'
print(email_address)
len(email_address)

### Accessing individual characters or letters from a string using *indexing*

Each character of a string has a single *index* ranging from 0 (the first character) to n-1 (the last character) where n is the length of the string.

Each position also has an index relative to the final position and these are negative numbers.


![strings.png](attachment:strings.png)


<div class="alert alert-warning" role="alert">
  <strong>Beware!</strong><br> 
    Some programming languages start indexing from 0 (Python, Java, C) and others from 1 (Matlab, R). This is something to keep in mind if you are going between languages.
</div>

<div class="alert alert-warning" role="alert">
  <strong>Beware part 2!</strong><br> 
    Except this is not true if you are doing reverse indexing where it starts from -1.
</div>

### Access elements of a string by putting their _index_ position in square brackets after the variable name

In [None]:
new_string = 'Python!'
print(new_string)
len(new_string)

In [None]:
# extract the first character in our new_string
# first index position is 0
new_string[0]

In [None]:
# store the desired index in a variable and use that to
# extract the first character in our new_string
idx = 1

print(new_string)

new_string[idx]

In [None]:
# same thing for a different variable
print(months)
print(months[0])

In [None]:
## Print the third character in the variable new_string
print(new_string)
print(new_string[2])


How about printing the last character in our `new_string`? What if we don't know how many characters there are?

The len() function gives us the total of number of elements in a string:

In [None]:
# assign output of len() function to a variable
length_of_string = len(new_string)

# how long is our string?
print(length_of_string)

### Check your understanding

If you have a variable `some_string = 'brooklyn'` you can get a single element by putting the elements index position in square brackets, for example:

>`idx = 3`

>`some_string[idx]`


**If `len(some_string)` tells us how many elements are in `some_string`, why can't we do the following to get the last element?**
> `last_index = len(some_string)`

> `some_string[last_index]`


In [None]:
some_string = 'brooklyn'

# get the length of the string variable
# and store that value in the variable last_idx
last_idx = len(some_string)
print('last_idx: ', last_idx)


In [None]:
# use the last_idx value to try to retrieve
# the final element of some_string

some_string[last_idx]

It didn't work. 

The error is telling us that the string index we requested (8) is 'out of range' meaning
it doesn't exist in this variable.

Can you see why?

<div class="alert alert-warning">
<b>Because we start indexing at 0 the last index position in a string is the length of the string minus 1</b></div>

The problem can be solved by simply subtracing 1 from the length:

In [None]:
# len() function returns a number so we can
# use the minus sign to subtract 1 from it
last_idx = len(new_string)-1
print(last_idx)
print(new_string[last_idx])
print(new_string)

<div class="alert alert-info">
<b>Flexible coding</b>. In the past few cells we've been working on a way to get the last element in a string in a way that is definitely more complicated than simply looking at the string or counting the number of elements yourself and using that minus as the index position. The advantage of doing it the way we've done it is that it will now work for any string and we never have to see it. <br><br>
Set <code>new_string</code> to any well-formed string and the code will give you the final element.<br><br> 
    
Throughout the course we will work on writing the code that is abstracted from the specific values of any variable and instead is flexible enough to work as long as the structure of your code and data is correct. <br><br>
Developing those habits and skills is invaluable in ensuring that your code works as expected and is reusable in new settings.
</div>

### Indexing from the end of a string

Take another look at this picture:


<div>
<img src="attachment:strings.png" width="75%"/>
</div>

This gives us a way to get the end of a string without explicitly calculating its length.

In [None]:
## Print the last character
new_string[-1]


In [None]:
## Print the second to last character
new_string[-2]

In [None]:
# print the first character for any string using len and reverse indexing
treatment_type = 'A_second-round'

# get the total number of elements
length = len(treatment_type)

# reverse indexing starts from the end at -1
# and counts down (increasing negatives)
# so set the length value to be negative and
# and store in an idx variable
idx = -length

# use it as the input to square bracket indexing:
print(treatment_type[idx])


In [None]:
# or make it all more compact
treatment_type[-len(treatment_type)]

# as with nested functions, the code inside the square brackets is evaluated first
# so len(treatment_type) is calculated and set to be negative and the result of all
# that is used immediately as the input to the index request []

### Slicing

To extract a group of characters from a string use __slicing__

The slicing syntax is like this:
```
a = 'a_string_of_letters'

a[start:stop]  # items from start value through stop value minus 1

a[start:]      # items from start value through the end of the array

a[:stop]       # items from the beginning up to stop-1 (not including a start position implies using 0)

a[:]           # a copy of the whole array
```
Use square brackets attached to your string variable to get characters from different index positions. `start` and `stop` would correspond to index numbers and the colon means get all the values between the two numbers.

In [None]:
## Print characters from the first position (index=0) to 
# the 4th position (index = 3) but *not* the one in 
# the 5th position (index = 4).
new_string = 'Python!'
print(new_string)
print(new_string[0:4])

In [None]:
# index position 4:
new_string[4]

### Exercise

Put a new string in variable `my_string`, then extract the second to fourth characters. So if your string is "broadway" you would be extracting "roa".

In [None]:
# Two different ways to get from the beginning through the third character
print(new_string)

# set the start position explicitly:
print(new_string[0:3])

# leave the start out and it defaults to 0
print(new_string[:3])


In [None]:
# Get all the characters from the second character (index position 1) through the end
print(new_string)

# leave off the stop position and it defaults to len(new_string)-1
print(new_string[1:])

print(new_string[1:len(new_string)])


<div class="alert alert-info">
Slicing with only a stop value like `[:n]` gives index positions 0 to n-1.<br>
Slicing with only a start value like `[n:]` gives index positions n to the end of the string.
</div>

In [None]:
# the `start` and `stop` values in the string slicing can also be stored in variables
start=1
stop=4

print(new_string)
print(new_string[start:stop])

# this becomes useful when you are processing text dynamically and the particular
# portions of a string you're want to extract are defined by other aspects of your 
# workflow

In [None]:
# imagine some code that receives user_names in a two part 
# 'family-name, given-name' style and you want to extract the first name

user_name = 'chekhov, anton'

# use the string find() function to locate the position of the comma
# find() is attached to an individual string (like user_name) and 
# takes as input a string you'd like to find inside the calling variable

# this line return the index position of a comma (',') in user_name
comma_idx = user_name.find(',')
print("comma_idx: ", comma_idx)

The previous cell introduced something new, which is a function or _method_ that is attached to a string variable. In this case we used the find() method. String objects -- individual string variables -- have other things they can do in addition to finding text inside of themselves and we'll return to those a little bit later.

In [None]:
# if the user_name is formed as we expect, we want to
# extract everything from the comma position plus 2 (to 
# account for the space) through the end
user_name[comma_idx+2:]

In [None]:
# and this code is flexible so we can use it for any user_name
# with the expected format

user_name = 'munro, alice'

# extract the first name
comma_idx = user_name.find(',')
first_name = user_name[comma_idx+2:]

# use the first name in some generic text
print(f'hello {first_name}, what would you like to do?')



In [None]:
# slightly cleaner, because in a string representation
# a space is just another character and you can find it directly:

user_name = 'munro, alice'

space_idx = user_name.find(' ')
first_name = user_name[space_idx+1:]
print(f'hello {first_name}, what would you like to do?')



### Exercise

Using the example from before, define a variable called `user_email` and set it to st704@nyu.edu

Use the find() method (attached to the `user_email` variable) and indexed slicing to extract the domain (everything after the @ symbol)

### Combining or concatenating strings

As we just saw in the previous examples, strings can be put together to form larger expressions like we did taking  someones first name stored in a variable and combining it with other text to make a greeting. 

One way to do this is to use the plus sign between each string to be combined (whether stored in a variable or written directly as a *string literal*) (*string literal* in this case just means a string itself rather than a variable with a string in it)

In [None]:
first_name = 'shannon'
last_name = 'tubridy'

# put together the two variables plus a space between them
full_name = first_name + ' ' + last_name
print(full_name)

# make a file name from variables
file_name = first_name + '_' + last_name + '.txt'
print(file_name)


One thing to keep in mind if you are concatenating strings using '+' is that all of the things being combined need to be strings. If we try to combine a month string with a numeric year it won't work.

In [None]:
month = 'january'
year = 2022

monthyr = month + ' ' + year
print(monthyr)


In order to concatenate a number to a string using '+' we need to first convert the number from type int or float to type str:

In [None]:
month = 'january'
year = 2022

# use the str() function to convert the year to a string
# before concatenating
monthyr = month + ' ' + str(year)
print(monthyr)

Another way to do it is using something called *fstring* formatting. 

Start with lowercase f and then inside of single quotes you put the variables in braces {} and other text just written out:

In [None]:
# make a string that is month, space, 2020
monthyr = f'{month} 2020'
print(monthyr)


In [None]:
# do it with two variables
year = '1999'
monthyr2 = f'{month} {year}'
print(monthyr2)

monthyr_sentence = f'This is {month} in the year {year}'
print(monthyr_sentence)

A nice thing about f string formatting is that it will automatically convert numbers to strings:

In [None]:
year_num = 2020

monthyr = f'{month} {year_num}'
print(monthyr)


<div class="alert alert-info">
    <em>Putting quotes into a string</em><br>
    If you want a string to include quotation marks or apostrophes in a string as in the word <i>that's</i> you can include what's called an *escape* character (backslash) like in the next code cell. When you print it the back slash does not appear.
</div>

In [None]:
# add a single quote mark to a string
my_string='that\'s'
print(my_string)

In [None]:
# add double quotes:
my_string='He said, \"Include a back slash as an escape character.\"'
print(my_string)

## String Operations

In addition to holding some text, individual string *objects*, or a particular instantiation of a string, can _do_ things. We already saw an example with `some_string.find()`

These things they can do are called *methods* and the syntax of how they get used is below. In the following bullet list you would replace `string` with the name of your variable the way we did `user_email.find()` above.

`string.count('x')`                count how many times x appears in the string

`string.find('x')`                   find the position of the first occurrence of x

`string.replace('x', 'y')`       replace x with y

`string.strip()`                     remove all the leading and trailing spaces from a string

`string.join(L)`                    returns a string with the values in L values (example below)

`string.zfill(n)`                 adds enough zeros to the front of the string to make it at least n characters long 


For example:

`participant_email = 'some_person@nyu.edu'`

`participant_email.count('@')`

`participant_email.find('.edu')`

`participant_email.strip()`

string.join() is usually used a little bit differently. It works like this:

`join_string = '_'`

`join_string.join('test')`

would produce

`t_e_s_t`

The value in the variable (_) is used to "join" each of the individual elements in the input to join().




#### Count how many times some substring appears in another string

In [None]:
# count occurrences:
participant_email = 'some_person@nyu.edu'

participant_email.count('@')#### Count how many times some substring appears in another string

In [None]:
participant_email = 'some_person@nyu.edu'
participant_email.count('e')

####  Find the index position of a substring in another string

The find() method returns the index of first occurrence of the substring (if found). If not found, it returns -1.

In [None]:
# find the index where a substring starts:
participant_email.find('.edu')

In [None]:
# find() returns a -1 if the substring does not
# appear in the calling variable
participant_email.find('.com')

In [None]:
# find returns the first occurence only
# the first occurrence of 2 is in the tenth position (second occurrence is ignored)
monthyr = 'september 2020'
print(monthyr.find('2'))

In [None]:
# convince ourselves that find works correctly and gives the right idx
random_str = 'pdodanclklafnadncpeoajdflkj;lkj'
idx = random_str.find('o')
print(idx)

# extract the character at the identified index position
print(random_str[idx])

#### Find and replace a portion of a string

string.find(x,y) will replace any text in the first input position (x) with whatever is in the second (y)

In [None]:
month = 'november'
month.replace('nov','sept')

In [None]:
# replace part of the string with another part:
participant_email = "some_person@nyu.edu"

print(f'participant_email before replace: {participant_email}')

# use string method replace, first input gets replaced with second
participant_email.replace('some_person','shannon.tubridy')

In [None]:
# check the value of participant_email variable... it didn't change
print(participant_email)

**Note that doing participant_email.replace() showed us the result of doing the replacement but when we look at the `participant_email` variable it still says `some_person@nyu.edu`.**

This is because the .replace() command does not change the string **in place**

In place means a command is carried out _and changes the value of the variable directly_

Other commands like .replace() will output the result of the command either to the output of your cell or to a new variable.

So to get the results of .replace() into a variable we can either output the results to a new variable or update that old variable with the new values

In [None]:
new_email = participant_email.replace('some_person', 'st704')
print(f'participant_email value: {participant_email}')
print(f'new_email value: {new_email`str.replace()` can also be used to remove text from a string but using '' as the replacement value (that's two quote marks with nothing, not even a space, inside) 

monthyr = 'september 2020'
# get rid of 2020
print(monthyr.replace(' 2020',''))
print(monthyr)

monthyr = 'september 2020'
# get rid of 2020 and update the variable
monthyr = monthyr.replace(' 2020','')
print(monthyr)}')

`str.replace()` can also be used to remove text from a string but using '' as the replacement value (that's two quote marks with nothing, not even a space, inside) 

In [None]:
monthyr = 'september 2020'
# get rid of 2020
print(monthyr.replace(' 2020',''))
print(monthyr)

In [None]:
monthyr = 'september 2020'
# get rid of 2020 and update the variable
monthyr = monthyr.replace(' 2020','')
print(monthyr)

#### Zero padding a string

One commmon thing we do with strings is padding them zeros. For example, we might be making filenames of the form 

`participantN_conditionY_section2.txt`

Where participant N is a number assigned to participants. It can be desirable to have the format for all match so that we have things like:

`participant001_conditionY_section2.txt`

`participant002_conditionY_section2.txt`

`participant013_conditionY_section2.txt`

`participant100_conditionY_section2.txt`

Rather than:

`participant1_conditionY_section2.txt`

`participant2_conditionY_section2.txt`

`participant13_conditionY_section2.txt`

`participant100_conditionY_section2.txt`



The string function zfill can do this for us:

In [None]:
## Pad a string with zeros to make it a desired length

# initialize a string
s = '20'

# add enough zeros to make the string four characters
print(s.zfill(4))

# add enough zeros to make the string 10 characters
print(s.zfill(10))

# add enough zeros to make the string 2 characters
print(s.zfill(2))

# can do it with any string but it usually makes the most sense with numbers
s = 'nyc'
print(s.zfill(5))

### Learning about other string methods

There are a number of methods that are attached to strings that we didn't see in this notebook. You can explore them at the following link:

https://www.w3schools.com/python/python_ref_string.asp

# Numeric operations



#### Numeric Data Structures and Operations

![num_operate.png](attachment:num_operate.png)


In [None]:
## Add two integers, get an integer
1-3

In [None]:
# store the result in a variable
a = 1+3
print(a)

In [None]:
## Add two floats, get a float
1.0 + 1.5

In [None]:
## Multiply an integer by a float, get a float
3*9.2

In [None]:
## Divide an integer by an integer reassigned as a float, get a float
4/float(2)

Negate a value using -

In [None]:
x = 3
y = -1 * x
print(x)
print(y)

Multiply using the * symbol

In [None]:
a = 3
b = 5
c = a*b
print(c)

**Division**

x/y  quotient for x divided by y

x%y  remainder of x divided by y

x//y  floored quotient for x divided by y (quotient rounded down) -- aka how many whole times could y fit in x

divmod(x, y) returns two values, first is the floored quotient, second is remainder

In [None]:
a = 3
b = 5

# do the division
print(b/a)

# verify that b/a * a gets us back to b
print((b/a)*a)


In [None]:
# remainder of b divided by a
b%a

In [None]:
# floored quotient of b divided by a
b//a

In [None]:
# get the remainder only
print(b%a)

# get the floored (rounded down) quotient
print(b//a)


# the original numerator should be demoninator times the floored quotient plus the remainder:
print((a*(b//a)+(b%a)))

Take a look at the last line in the previous code cell where we used parentheses to control the order of operations and make sure that we multipled `a` times the result of `b//a` and added that to the total of `b%a`. Python will work inside out, evaluating the innermost parenthetical expressions first.



The 'modulo' or % operator that returns the remainder is useful for checking whether a number is odd or even. Even numbers divided by 2 will always have remainder 0.

In [None]:
a = 1111988729115
print(a%2)

b = 4
print(b%2)

c = 1111111111111111348
print(c%2)

We can also use the _function_ `divmod()` that will give us the floored quotient and remainder all at once

`divmod()` has two inputs: first is the numerator, second is denominator

In [2]:
b = 4
a = 5
divmod(b, a)

# 4 divided by 5 is 0 with remainder 4:

(0, 4)

In [3]:
# get the outputs of divmod into a variable:
results = divmod(b, a)
print(results)


# The result is a 'tuple' which is a datatype we haven't learned about yet
# but we can get individual elements using indexing similar to a string

# get the first entry in results:
print(results[0])

# get the second entry in results:
print(results[1])
# print(remainder)

(0, 4)
0
4


In [None]:
# for functions that give multiple outputs we can also
# gather the outputs into separate variables like this:
floored_q, remainder = divmod(b, a)
print(floored_q)
print(remainder)

**Using exponents** Raise a number to some power using `x**y` or pow(x,y). Both of those will calculate $x^y$.

$4^2$

In [None]:
x=4
x**2

In [None]:
x=4
y=2
x**y

$5^{-2}$

In [None]:
x=5
y=-2
x**y

In [None]:
x=4
y=2
squared_x = pow(x, y)
print(squared_x)

**Updating a variable value through addition, subtraction, division, etc**

It is often useful to take a variable with some numeric value, and update the value by adding to it, subtracting from it, multiplying it and so on.

The `+=` syntax makes this compact to write.

In [None]:
a = 3
print(f'a = {a}')


# compact way of writing: a = a+1
a += 1
print(f'now a = {a}')


# compact way of writing: a = a-2:
a -= 2
print(f'now a = {a}')


# compact way of writing: a = a/2:
a /= 2
print(f'now a = {a}')

# compact way of writing: a = a*10:
a *= 10
print(f'now a = {a}')

#compact way of writing a raised to the 100th power
a **= 100
print(f'now a = {a}')



# Summary

This notebook provided an introduction to two of the fundamental Python data types: numbers and strings. It also demonstrated some of the operations that can be done with strings and numbers, like finding and replacing text, combining strings together, doing basic math and so on. 

There is still much to learn of course, and this is not the end of these topics, but this should be enough to get us started on building more interesting code.