# Strings, Lists, and Dictionaries

**By Arpit Omprakash, Byte Sized Code**

## Strings

**A Quick Recap:**  
A string is a data type in python that's used to represent a piece of text.  
- It's enclosed between quotes, either double or single.
- We can concatenate string to form longer strings by using the plus sign.
- We can multiply a string by a number to repeat it that many times.
- The `len` function tells us the number of characters present in a string.

But there's so much more that can be done with strings in Python!!

### The Parts of a String

**String Indexing** is an operation that lets us access the character in a given position (or index) using square brackets and the number of the position we want to access.  
It should be noted that strings are **zero-indexed**, i.e., the position numbering starts with a `zero` and ends with the `length of string - 1`.

In [1]:
var = "python"

In [2]:
print(var[0])

p


In [3]:
print(var[1])

y


In [4]:
print(var[len(var) - 1])

n


What if we don't know the length of the variable we are indexing and we can't use the `len` function?  
We can still get the character at the last position by something called a negative index.  

**Negative indexes** lets us access the positoins in the string starting from the last one.
NOTE: negative indexes start at `-1` and continue till `-length of string`

In [5]:
print(var[-1])

n


In [6]:
print(var[-6])

p


Apart from accessing a single character from a string, we can also access a slice from a string.  
A **slice** is a portion of a string that can contain more than one character; also sometimes called a substring.  
This is achieved by creating a range inside the square brackets separated by a colon(:).  
- var[m:n] gives us the slice from position `m` to position `n-1`
- var[m:] gives us the slice from position `m` to the end of the string
- var[:n] gives us the slice from the beginning to position `n-1`

In [7]:
print(var[3:5])

ho


In [8]:
print(var[4:])

on


In [9]:
print(var[:3])

pyt


### Creating New Strings & Substrings

What if we want to change a part of a string?  
Suppose we had a typo:

In [10]:
var = "pythin"
print(var)

pythin


Let's try changing the value at the given index

In [11]:
var[4] = "o"

TypeError: 'str' object does not support item assignment

As we see above, that gives us an error. But why?  
Strings are what are called **immutable** data types.  
That is a fancy way of saying that once we create a string, we can't change its contents.  
Then what can we do here?  
We can simply reassign the variable by correcting the typo or better, create a new string based on the old one.

In [12]:
var = var[:4] + "o" + var[5:]
print(var)

python


What if we know the typo, but we don't know where the typo is?  
We can use something called the `index` function for strings.

In [13]:
pets = "cat and pythin"

In [14]:
print(pets.index("i"))

12


We can also use the `index` function with slice of strings.

In [15]:
print(pets.index("and"))

4


However, we need to be really sure that the substring we are searching for is present in the string to prevent any errors.

In [16]:
print(pets.index("dog"))

ValueError: substring not found

How can we make sure that a substring is present in a string?  
We can use the `in` keyword.  
It's basically like in English.

In [17]:
print("cat" in pets)

True


In [18]:
print("dog" in pets)

False


Let's have a look at a real world example where we can use the things we have learned till now.  
**P.S.** It is assumed that you are on this notebook after coming through the "Basic Syntax", and "Loops and Recursions" notebook, thus, we will be using things we learnt there without much explanation.

In [19]:
# Changes the domain of a given email from the old domain to the new domain
def change_domain(email, old_domain, new_domain):
    if "@" + old_domain in email:
        ind = email.index("@" + old_domain)
        new_email = email[:ind] + "@" + new_domain
        return new_email
    return email

In [20]:
result = change_domain("xyz@example.com", "example.com", "google.com")
print(result)

xyz@google.com


### More String Functions and Methods

**UPPER**  
This function transforms any English characters in the string to uppercase.

In [21]:
print("Mountains".upper())

MOUNTAINS


**LOWER**  
This function transforms any English characters in the string to lowercase.

In [22]:
print("MounTaiNS".lower())

mountains


**STRIP**  
This function gets rid of trailing and preceding spaces in the string.  
We can use the more specialized **lstrip** to get rid of preceding spaces and also use **rstrip** to get rid of the trailing spaces.

In [23]:
var = " yea "
var

' yea '

In [24]:
var.strip()

'yea'

In [25]:
var.lstrip()

'yea '

In [26]:
var.rstrip()

' yea'

**COUNT**  
Returns how many times a given substring appears within a string.

In [27]:
print("The number of times e occurs in this string is 4".count("e"))

4


**ENDSWITH**  
This method returns whether the string ends with a certain substring.

In [28]:
print("Forest".endswith("rest"))

True


**ISNUMERIC**  
This method returns whether the string is composed of only numbers.

In [29]:
print("name".isnumeric())

False


In [30]:
print("1234".isnumeric())

True


**INT**  
If a string returns true for `isnumeric`, we can use the `int` function to convert it into an integer.

In [31]:
print(int("1234"))

1234


**JOIN**  
The join method is also used for concatenating strings.  
The syntax is as follows:
```
"what_to_join_with".join(strings_to_join)
```

In [32]:
print("-".join(["Ben", "25", "Cal"]))

Ben-25-Cal


**SPLIT**  
We can also split a string to a list of strings.  
By default it splits everything by whitespace, but we can change that behaviour by providing it with a parameter.

In [33]:
print("wow. this is great".split())

['wow.', 'this', 'is', 'great']


In [34]:
print("wow. this is great".split("."))

['wow', ' this is great']


### String Formatting

Strings in python provide a powerful way to format strings on the fly without having to concatenate smaller parts of strings to make up a large one.  
Here is an example:

In [35]:
name = "amish"
number = len(name) * 3
print("Hi {} your lucky number is {}".format(name, number))

Hi amish your lucky number is 15


We used the format method on the string and passed on the variables that we want to substitute the curly braces with **in order**. This leads to the name being substituted for the first curly bracket and the number being substituted for the second curly bracket.  
Notice that we didn't have to convert the number from integer to string, the format method does this for us! So glad we have it.

But wait, there's even more!  
By using certain expressions inside the curly brackets we can further enhance the string formatting operation. Lets have a look.

In [36]:
print("Your lucky number is {number}, {name}".format(name=name, number=len(name)*5))

Your lucky number is 25, amish


Because we are using placeholders for the variable names, the order in which the variables are passed doesn't matter now.  
But also notice that we had to modify the way in which we present the variables to the format method as arguments.

In [37]:
price = 10.5
with_tax = price * 1.05
print("Base price: Rs{:.2f}, with tax: Rs{:.2f}".format(price, with_tax))

Base price: Rs10.50, with tax: Rs11.03


Having three decimal places for price is a bit of overkill as we don't have the smaller denominations anymore. Two decimal places seems reasonable though.  
Here we are using what are called formatting expressions inside the curly brckets to round off the values upto two decimal places.  

The colon(:) indicates that we are starting our formatting expression.  
After the colon, we write .2f 
- this means we are formatting a float number
- there should be two decimal places after the decimal dot

Here's another example:

In [38]:
def to_celsius(x):
    return (x - 32) * 5 / 9

for x in range(0, 101, 10):
    print("{:>3} F | {:>6.2f} C".format(x, to_celsius(x)))

  0 F | -17.78 C
 10 F | -12.22 C
 20 F |  -6.67 C
 30 F |  -1.11 C
 40 F |   4.44 C
 50 F |  10.00 C
 60 F |  15.56 C
 70 F |  21.11 C
 80 F |  26.67 C
 90 F |  32.22 C
100 F |  37.78 C


The expressions now contain a greater than sign, that tells the format function that we should align the values to the right.  
In the first expression we want the numbers to be aligned to the right for three spaces and six spaces for the second.  
We also want the decimal numbers to have two decimal places in the second expression.

## Lists

Lists are another kind of data type that we have used but not really described till now.  
Lists help us contain collections of items for example, collections of strings or numbers, etc.  
Lists are enclosed in square brackets and the items in a list are separated by a comma.

In [39]:
x = ["lists", "are", "awesome"]
print(type(x))

<class 'list'>


Lists and Strings share a lot of features. This is because both of them belong to what are called **sequences** in python.  
Sequences have the following properties:
- they can be iterated over by using `for` loops
- they can be sliced and indexed
- `len(sequence)` returns the number of items in sequence
- they can be concatenated using the plus sign
- they support the use of the `in` keyword

Lets have a quick run through these features:

**LEN**  
`len` returns the number of elements in a list.

In [40]:
print(len(x))

3


**IN**   
`in` can be used to check if a given item is present in the list

In [41]:
print("are" in x)

True


**Indexing**  
Lists are indexed the same way are strings.  
Lists are also **zero-indexed** and we can create slices of lists.

In [42]:
print(x[0])

lists


In [43]:
print(x[:3])

['lists', 'are', 'awesome']


In [44]:
print(x[2:])

['awesome']


**Contatenating Lists**

In [45]:
print(["lists", "are"] + ["awesome"])

['lists', 'are', 'awesome']


### List specific functions

Unlike strings, lists are **mutable**, thus we can modify a list in place without having to build it again.  
This gives rise to various list specific functions that are helpful in modifying the contents of a list.

**APPEND**  
Adds an element to the end of a list

In [46]:
fruits = ["pineapple", "banana", "cherry"]
print(fruits)
fruits.append("apple")
print(fruits)

['pineapple', 'banana', 'cherry']
['pineapple', 'banana', 'cherry', 'apple']


**INSERT**  
Adds an element to a specified index on the list.  
If you use an index higher than the current length, the element gets added to the end of the list.

In [47]:
fruits.insert(0, "orange")
print(fruits)

['orange', 'pineapple', 'banana', 'cherry', 'apple']


In [48]:
fruits.insert(20, "mango")
print(fruits)

['orange', 'pineapple', 'banana', 'cherry', 'apple', 'mango']


**REMOVE**  
Removes a given element from the list.  
NOTE: You get an error if the element is not present in the list.

In [49]:
fruits.remove("apple")
print(fruits)

['orange', 'pineapple', 'banana', 'cherry', 'mango']


In [50]:
fruits.remove("jackfruit")

ValueError: list.remove(x): x not in list

**POP**  
The pop function removes an element from the provided index.

In [51]:
fruits.pop(1)
print(fruits)

['orange', 'banana', 'cherry', 'mango']


**REASSIGNMENT**  
Finally, we can reassign values inside a list using indexing.

In [52]:
fruits[0] = "apple"
print(fruits)

['apple', 'banana', 'cherry', 'mango']


### Lists and Tuples

Tuples are also collections of elements like lists, but they are **immutable** like strings.  
Tuples are denoted by parentheses, ().  
Tuples also belong to the sequence class in python and thus, are considered as sequences.  
Thus, we can also index tuples, get their lengths, use the keyword `in` and also iterate over tuples.

Why tuples?  
There may be some cases where we want a collection of items, but we don't want it to be modifyable.  
For example, it makes sense to store a name that we know won't change inside a tuple.  
If it was present in a list, we could have just changed the value without even realizing or added another item which won't really make any sense.

In [53]:
fullname = ("arpit", "omprakash")
print(fullname)

('arpit', 'omprakash')


On the other hand, functions that return more than one values generally use a tuple to pack the values into.  
This prevents the values from being modified while transport.  
Remember the following example?

In [54]:
def convert_seconds(seconds):
    hours = seconds // 3600
    minutes = (seconds - hours * 3600) // 60
    remaining_seconds = seconds - hours * 3600 - minutes * 60
    return hours, minutes, remaining_seconds

In [55]:
result = convert_seconds(14400)
print(type(result))
print(result)

<class 'tuple'>
(4, 0, 0)


**UNPACKING**  
We can turn a tuple of n elements into n different variables.  
Let's see an example with the tuple above:

In [56]:
hours, minutes, seconds = result
print(hours, minutes, seconds)

4 0 0


### Iterating over Lists and Tuples

As lists and tuples are sequences, we can iterate over them using a `for` loop.

In [57]:
animals = ["Lion", "Zebra", "Elephant", "Giraffe"]
chars = 0
for animal in animals:
    chars += len(animal)
print("Total characters: {}, Average length: {}".format(chars, chars/len(animals)))

Total characters: 24, Average length: 6.0


We can also use indexing to iterate over a list/tuple and simultaneously use the index for other things in the code.  
Here's an example:

In [58]:
winners = ("Ashley", "Dylan", "Reese")
for index in range(len(winners)):
    print("{} - {}".format(index + 1, winners[index]))

1 - Ashley
2 - Dylan
3 - Reese


This is particularly useful if you want to iterate over a list and at the same time modify the list.  
For example:

In [59]:
family = ["Rob", "Tom", "Harry"]

for index in range(len(family)):
    family[index] = family[index] + " Hudson"
print(family)

['Rob Hudson', 'Tom Hudson', 'Harry Hudson']


**CAUTION**  
However, you should be really careful while iterating over a list and removing items.  
This may lead to some really weird effects and errors:

In [60]:
numbers = ['1', '2', '3', '4', '5', '6', '7', '8', '9', '10']
for index in range(len(numbers)):
    numbers.remove(numbers[index])
print(numbers)

IndexError: list index out of range

It is wise to use a copy of the list while removing elements and iterating over them.  
You can find more about why this happens [here](https://docs.python.org/2/tutorial/controlflow.html#for-statements)

In [61]:
numbers = ['1', '2', '3', '4', '5', '6', '7', '8', '9', '10']
numbers_copy = numbers.copy()
for index in range(len(numbers_copy)):
    numbers.remove(numbers_copy[index])
print(numbers)

[]


**Enumerate**  
We can also use the `enumerate` function to get the index and item simultaneously from a list/tuple.

In [62]:
for index, person in enumerate(winners):
    print("{} - {}".format(index + 1, person))

1 - Ashley
2 - Dylan
3 - Reese


A slightly complex example:  

Suppose we get a list of names and email addresses and we have to format it to a different format.  
*Input:*  
[(name1, email1), (name2, email2), (name3, email3)]  
*Output:*  
[name1\<email1>, name2\<email2>, name3\<email3>]

In [63]:
old_list = [("Robin", "robin@gmail.com"), ("Usain", "bolt@gmail.com"), ("Alan", "turing@gmail.com")]
new_list = []
for name, email in old_list:
    new_list.append(name + "<" + email + ">")
print(new_list)

['Robin<robin@gmail.com>', 'Usain<bolt@gmail.com>', 'Alan<turing@gmail.com>']


### List Comprehensions

We generally use a `for` loop for creating lists based on sequences.  
For example:

In [64]:
multiples = []
for i in range(10):
    multiples.append(7 * i)
print(multiples)

[0, 7, 14, 21, 28, 35, 42, 49, 56, 63]


Because creating lists like this is a pretty routine task, Python offers a simpler way to do it. Using List Comprehensions.  
The syntax is as follows:
```
[item for item in sequence]
```

Here's an example to make things clearer:

In [65]:
new_multiples = [7 * i for i in range(10)]
print(new_multiples)

[0, 7, 14, 21, 28, 35, 42, 49, 56, 63]


We can also use conditionals in list comprehensions to create much complex lists in one line.  
For example, creating the list of multiples of three less than a hundred.

In [66]:
multiple_of_3 = [x for x in range(1, 101) if x % 3 == 0]
print(multiple_of_3)

[3, 6, 9, 12, 15, 18, 21, 24, 27, 30, 33, 36, 39, 42, 45, 48, 51, 54, 57, 60, 63, 66, 69, 72, 75, 78, 81, 84, 87, 90, 93, 96, 99]


## Dictionaries

Like lists, dictionaries are collections of data.  
Unlike lists, however, the data inside dictionaries take the form of pairs of keys and values.  
To get the value from a dictionary, we use the key.  

The term dictionary comes from how dictionaries work in the real world.  
We search for a given word(key) and then pull out its meaning(value) from the dictionary.

A dictionary is created by enclosing it in curly brackets {}.  
We just pass in the key value pairs to a dictionary separating the keys and values by a colon(:).  
It should be noted that any data type can be the key and any other data type can be a value. But the key should be unique and immutable for proper functionality.

In [67]:
file_counts = {"jpg": 10, "txt": 30, "csv": 3, "py":50}
print(type(file_counts))
print(file_counts)

<class 'dict'>
{'jpg': 10, 'txt': 30, 'csv': 3, 'py': 50}


We can access values by querying the keys to the dictionary.

In [68]:
print(file_counts['py'])

50


We can also use the `in` keyword to check if a key is present in a dictionary.

In [69]:
print('jpg' in file_counts)

True


### Modifying Dictionaries

Dictionaries are **mutable**, thus we can change the contents of a dictionary easily.  
We can add entries by simply creating a new key value pair as follows:

In [70]:
file_counts['png'] = 5
print(file_counts)

{'jpg': 10, 'txt': 30, 'csv': 3, 'py': 50, 'png': 5}


Adding values to existing keys overwrites the values!

In [71]:
file_counts['py'] = 28
print(file_counts)

{'jpg': 10, 'txt': 30, 'csv': 3, 'py': 28, 'png': 5}


We use the `del` keyword to delete keys from the dictionary.

In [72]:
del file_counts['txt']
print(file_counts)

{'jpg': 10, 'csv': 3, 'py': 28, 'png': 5}


### Iterating over Dictionaries  

We can also use a `for` loop to iterate over a dictionary.

In [73]:
for extensions in file_counts:
    print(extensions)

jpg
csv
py
png


We can also get the values by using the keys inside our `for` loop body.  
Or we can use a function called `items()` to get a key-value pair from the dictionary.

In [74]:
for extensions in file_counts:
    print(extensions, file_counts[extensions])

jpg 10
csv 3
py 28
png 5


In [75]:
for extensions, counts in file_counts.items():
    print(extensions, counts)

jpg 10
csv 3
py 28
png 5


We can even access just the keys or values of a dictionary by using the `keys()` and `values()` methods.

In [76]:
for key in file_counts.keys():
    print(key)

jpg
csv
py
png


In [77]:
for value in file_counts.values():
    print(value)

10
3
28
5


**An example**  
Counting frequencies.  
Count the frequency of letters in a given string.

In [78]:
def count_letters(word):
    result = {}
    for letter in word:
        if letter not in result:
            result[letter] = 1
        else:
            result[letter] += 1
    return result

In [79]:
print(count_letters("python"))

{'p': 1, 'y': 1, 't': 1, 'h': 1, 'o': 1, 'n': 1}


In [80]:
print(count_letters("this is a large sentence"))

{'t': 2, 'h': 1, 'i': 2, 's': 3, ' ': 4, 'a': 2, 'l': 1, 'r': 1, 'g': 1, 'e': 4, 'n': 2, 'c': 1}


## Dictionaries or Lists?

What is best for a given condition?

Think about the data you want to store.  
If you've got a list of information that you need to collect or store, it is better to store them as a list.  
Suppose you have a list of name of people attending college with you. This is a typical case where you would like to use a list.  
On the other hand, if it is a list to a party and each person can bring in some number of guests, then you might want to store them in a dictionary where keys are the persons and values are number of people coming with them.

Because of the way dictionaries work, it is pretty efficient to search for things inside a dictionary.  
If you have a dictionary and you want to check if one element is present in the dictionary, it will take the same amount of time regardless of whether the dictionary has 10 elements or 10,000 elements.  
But in the case of a list, searching for an element in a list of 10 items is pretty fast compared to searching for a given item in a list of 10,000 items.

So in general, if you want to search if a specific item is present, it is better to use a dictionary to store your data.

In lists we can store any data type.  
This is the same for values in dictionary, but the keys need to be immutable variables (not lists or dictionaries).  
Thus, there is ample space to create complex datasets using the various data types that python offers.  
Try to experiment and use both lists and dictionaries initally for different cases till you feel good enough to judge which will be a better fit in a given case.