## Data Structures


We have covered in detail much of the basics of python's primitive data types. Its now useful to consider how these basic types can be collected in ways that are meaningful and useful for a variety of tasks. Data structures are a fundamental component of programming, a collection of elements of data that adhere to certain properties, depending on the type. In these notes, we'll present three basic data structures, the list, the set, and the dictionary. Python data structures are very rich, and beyond the scope of this simple primer. Please see [the documentation](http://docs.python.org/2/tutorial/datastructures.html) for a more complete view.


### List:

(Readings: LPTHW, Examples 32-34, and 38)

A list, sometimes called and array or a vector is an ordered collection of values. The value of a particular element in a list is retrieved by querying for a specific index into an array. Lists allow duplicate values, but but indicies are unique. In python, like most programming languages, list indices start at 0, that is, to get the first element in a list, request the element at index 0. Lists provide very fast access to elements at specific positions, but are inefficient at "membership queries," determining if an element is in the array. 

In python, lists are specified by square brackets, `[ ]`, containing zero or more values, separated by commas. Lists are the most common data structure, and are often generated as a result of other functions, for instance:

`a_string.split(" ")`

will take a string, split it on space, and then return a list of the smaller substrings.

To query a specific value from a list, pass in the requested index into square brackets following the name of the list. Negative indices can be used to traverse the list from the right. (Remember the case with strings and accessing the individual characters? It is exactly the same. In fact, strings are treated in Python as lists of characters.)

In [None]:
my_string = "Wow these data structures make for exciting dinner conversation"
list_of_words = my_string.split(" ")
print(list_of_words)

In [None]:
a_list = [1, 2, 3, 0, 5, 10, 11]
print(a_list)

In [None]:
a_list = ["Panos", "Maria", "Anna", "James" ]
print(a_list)

In [None]:
empty_list = []
print(empty_list)

#### Accessing parts of a list: Indexing and Slicing revisited

In [None]:
another_list = ["a", "b", "c", "d", "e"]
print(another_list[1])
print(another_list[2:4])

In [None]:
a_list = [1, 2, 3, 0, 5, 10, 11]
print(a_list[-1]) # indexing from the right

In [None]:
print(a_list[-3:])

#### Exercise

You are given a list of names (on Slack) as one big, multiline string. 
* Using the split() command to separate it into lines and get back the list of names. 
* Extract the 3rd name from the list
* Extract the penultimate (second from the last) name
* Retrieve the 5th to 10th names from the list
* Retrieve the last 10 names.

#### Some common functions for lists

+ `list.append(x)`: add an element ot the end of a list
+ `list_1.extend(list_2)`: add all elements in the second list to the end of the first list
+ `len(list)`: returns the number of elements in the list

In [None]:
# Example of append
a_list = ["Panos", "John", "Chris", "Josh", "Mary", "Anna"]
a_list.append("Elena")
a_list.append("Sofia")
print(a_list)

In [None]:
# Compare append vs extend; notice that "extend" does not created nesting here
a1_list = ["Panos", "John", "Chris", "Josh", "Mary", "Anna"]
a2_list = ["Elena", "Sofia"]
a1_list.extend(a2_list)
print(a1_list)

In [None]:
# Notice that append will not work as expected when we pass a list
# We now created a "nested" list. We will examine nested lists later
b1_list = ["Panos", "John", "Chris", "Josh", "Mary", "Anna"]
b2_list = ["Elena", "Sofia"]
b1_list.append(b2_list)
print(b1_list)

In [None]:
# Notice that the two lists have different lengths
print("Length of a1_list:", len(a1_list))
print("Length of b1_list:", len(b1_list))

#### Exercise

Find out how many names you had in the list of names that you created earlier.

#### Sorting and reversing lists

* `list.sort()`: sorts the list of items
* `list.reverse()`: reverses the order of the list

In [None]:
# Sort example
b_list = ["Panos", "John", "Chris", "Josh", "Mary", "Anna"]
b_list.sort()
print(b_list)

In [None]:
# Reverse example
b_list = ["Panos", "John", "Chris", "Josh", "Mary", "Anna"]
b_list.reverse()
print(b_list)

#### Exercise

* Sort `b_list` in reverse alphabetical order. Use the `sort()` and `reverse()` functions.

In [None]:
b_list = ["Panos", "John", "Chris", "Josh", "Mary", "Anna"]
# Your code here
print(b_list)

#### List concatenation and "multiplication"

In [None]:
# List concatenation. This is similar to "extend"
a_list = ["Panos", "John", "Chris"]
b_list = ["Josh", "Mary", "Anna"]
print(a_list + b_list)

In [None]:
# List multiplication
a_list = ["Panos", "John", "Chris"]
print(3*a_list)

#### Finding things in lists

* `list.index(x)`: looks through the list to find the specified element, returning it's position if it's found, else throws an error
* `list.count(x)`: counts the number of occurrences of the input element

In [None]:
# Count
b_list = ["Panos", "John", "Chris", "Josh", "Mary", "Anna", "John"]
print("# of Panos in the list", b_list.count("Panos"))
print("# of John  in the list", b_list.count("John"))

In [None]:
# Can you figure out what we do here?
b_list.extend(b_list)
print("# of Panos in the list", b_list.count("Panos"))
print("# of John  in the list", b_list.count("John"))

#### Exercise

* Find your name in the list. In what position does your name appear?
* Try to find a name that does not exist in the list. What do you get?

Now let's practice the `if-else` command:

* Define a variable `search` with the name that you want to search for.
* Check if the name appears in the list (Hint: Use the `count()` command)
    * If yes, then return the index number (Hint: Use the `index()` command)
    * If not, print that the name does not appear in the list

In [None]:
search = 'Panos Ipeirotis'

# Check if the name appears in the list / Use the .count() command
# If yes, then return the index number / Use the .index() command
# If not, print that the name does not appear in the list

#### Adding and removing items in the list

* `list.insert(index, x)`: insert element x into the list at the specified index. Elements to the right of this index are shifted over
* `list.pop(index)`: remove the element at the specified position


#### Exercise

* Add the letter "d" in `another_list` and print(the result)
* Add the letter "c" in `another_list` and print(the result)
* If you search for "c" in `another_list` using the list.index(x) command, what is the result?
* Sort `another_list` and print(the result)
* Use the `split()` operation for strings (that we learned before) and count the number of words in the sentence "Python is the word. And on and on and on and on..." 

In [None]:
another_list = ["a", "b", "c"]
# your code here

In [None]:
sentence = "Python is the word. And on and on and on and on..."
# your code here

#### Functions that apply to lists

* `len`: We have already seen that `len(list)` returns the number of elements in a list.
* `sum`: The function `sum(list)` sums up all the (numeric) elements of a list
* `max`: Returns the maximum element of a list
* `min`: Returns the minimum element of a list

In [None]:
nums = [3, 41, 12, 9, 74, 15]
print("Length:", len(nums))

In [None]:
print("Max:", max(nums))

In [None]:
print("Min:", min(nums))

In [None]:
print("Sum:", sum(nums))

#### Exercise

* How many names are in the list of names?
* How many _words_ are in the list of names?
* What is the result of the `min` and `max` functions when applied to a list of names?

* Write code that computes the average value of a list of numbers
* Write code that computes the median value of a list of numbers

#### Exercise

Let's apply some of the things that we learned so far in the article below.

In [None]:
washington_post = """MOSCOW — Russian officials vehemently defended the country’s airstrikes in Syria on Thursday as blows to Islamic State militants even as evidence mounted suggesting that U.S.-backed rebels and others were facing the brunt of Moscow’s attacks.
And while Russian officials and diplomats rallied behind President Vladimir Putin, the Kremlin’s stance appeared further clouded by acknowledgments that the missions have already extended beyond solely the Islamic State.
In Paris, the Russian ambassador to France, Alexander Orlov, said the Russian attacks also targeted an al-Qaeda-linked group, Jabhat al-Nusra, or al-Nusra Front.
Syria’s ambassador to Russia, Riad Haddad, echoed that the joint hit list for Russia and the Syrian government included Jabhat al-Nusra, which is believed to have some coordination with the Islamic State but is still seen mostly as a rival.
“We are confronting armed terrorist groups in Syria, regardless of how they identify themselves, whether it is Jabhat al-Nusra, the ISIL or others,” he said, using one of the acronyms for the Islamic State.
Graphic Did the Russians really strike the Islamic State? VIEW GRAPHIC 
“They all are pursuing ISIL ends,” he added, according to the Interfax news agency.
The ambassadors did not specifically mention any U.S.- and Western-backed rebel groups.
But the comment was certain to deepen suspicions by Washington and allies that Putin’s short-term aim is to give more breathing space to Syria’s embattled President Bashar al-Assad, whose government is strongly supported by Moscow.
Syrian activists, meanwhile, ramped up their own claims that Moscow was hitting groups seeking to bring down Assad, who has managed to hang on during more than four years of civil war.
Russia’s expanding military intervention in Syria added urgency to separate efforts by Russia and U.S. officials to coordinate strategies against the Islamic State and avoid potential airspace missteps between the two powers — so-called “deconfliction” talks. The Pentagon said the discussions will begin Thursday.
One monitoring group, the Britain-based Syrian Observatory for Human Rights, said Russian airstrikes again struck strongholds of an American-backed rebel group, Tajamu Alezzah, in central Hama province.
The actions, quickly criticized by Washington, add an unpredictable element to a multilayered war.
The observatory also reported that airstrikes hit the northwestern city Jisr al-Shughour, which is in the hands of rebel groups including al-Nusra, after battles last month to drive back Assad’s forces.
Among the locations hit was a site near Kafr Nabl, the northern Syrian town whose weekly protests against the government, often featuring pithy slogans in English, won it renown as a symbol of what began as a peaceful protest movement against the Assad regime. The local council receives U.S. assistance, and the rebel unit there has received support under a covert CIA program aimed at bolstering moderate rebels.
Raed Fares, one of the leaders of the protest movement in Kafr Nabl, said warplanes struck a Free Syrian Army checkpoint guarding Roman ruins on the outskirts of the town. He said the explosion was bigger than anything local residents had seen in three years of airstrikes conducted by Syrian warplanes.
“It made a fire six kilometers wide,” he told The Washington Post.
Other sites hit on the second day of Russian bombing included locations in the province of Hama. The targets suggested the main intention of the strikes was to shore up government control over a corridor of territory linking the capital, Damascus, to the Assad family’s coastal heartland, where the Russians are operating out of an expanded air base.
Syrian rebels, some of them U.S.-backed, had been making slow but steady gains in the area, considered one of the government’s biggest vulnerabilities. There has been no Islamic State presence there since January 2014, when moderate rebels rose up against the extremists and forced them to retreat to eastern Syria.
In Washington, Sen. John McCain (R-Ariz.) told CNN he could “absolutely confirm” that airstrikes hit Western-backed groups such as the Free Syrian Army and other factions “armed and trained by the CIA.”
“We have communications with people there,” said McCain, chairman of the Senate Armed Services Committee.
The accounts could not be independently assessed, but the main focus of the Russian attacks appeared to be in areas not known to have strong Islamic State footholds.
In Moscow, the reply was blunt.
“Total rubbish,” Gennady Zyuganov, a member of parliament and leader of Russia’s Communist Party, said of the U.S. accusations.
In televised remarks Thursday, Putin called accusations that Russian airstrikes had killed civilians in Syria “information attacks.”
He also addressed concerns about an accidental military clash between Russian and U.S.-led coalition forces, saying that his intelligence and military agencies were “establishing contacts” with counterparts in the United States.
“This work is ongoing, and I hope that it will conclude with the creation of a regularly acting mechanism,” he said.
A spokesman for Russia’s Defense Ministry, Igor Konashenkov, said Thursday that warplanes hit a dozen Islamic State sites in the past 24 hours, destroying targets including a command center and two arms depots.
The United States and Russia agree on the need to fight the Islamic State but not about what to do with the Syrian president. The Syrian civil war, which grew out of an uprising against Assad, has killed more than 250,000 people since March 2011 and sent millions of refugees fleeing to countries in the Middle East and Europe.
Accusing Russia of “pouring gasoline on the fire,” Defense Secretary Ashton B. Carter vowed that U.S. pilots would continue their year-long bombing campaign against the Islamic State in Syria, despite Moscow’s warning that American planes should stay away from its operations.
“I think what they’re doing is going to backfire and is counterproductive,” Carter said on Wednesday.
Yet Russia’s military flexing in Syria brought quick overtures from neighboring Iraq, where the Islamic State also holds significant territory but the government is within Washington’s fold.
Iraq’s prime minister, Haider al-Abadi, told France 24 that he “would welcome” Russia joining the U.S.-led airstrikes against Islamic State targets, but there have been no specific discussions.
Joining the protests against the Russian airstrikes was Saudi Arabia, a leading foe of Assad and one of Washington’s top Middle East allies.
At the United Nations late Wednesday, the Saudi ambassador, Abdallah al-Mouallimi, demanded that the Russian air campaign “stop immediately” and accused Moscow of carrying out attacks in areas outside the control of the Islamic State.
In Iran, Assad’s main regional backer, Foreign Ministry spokeswoman Marzieh Afkham called Russia’s military role a step “toward resolving the current crisis” in Syria.
Sly reported from Beirut, and Murphy from Washington. Daniela Deane in London, William Branigin in Washington and Loveday Morris in Baghdad contributed to this report.
"""

* What is the length of the document below in characters? In words? In paragraphs?
* What is the average length of a paragraph in words?
* What is the average length of a word in characters? (Remember that the document contains spaces and newlines, that should not count as parts of a word)

In [None]:
# Your code here

### Tuples

A tuple is similar to a list and consists of a number of values separated by commas. For instance:

In [None]:
t = (12345, 54321, 54321, 'hello!')
print(t)

The usual slicing and indexing operators still apply:

In [None]:
print(t[3])

And similarly, we can use the `count` and `index` functions. 

In [None]:
t.index('hello!')

However, a tuple but is *immutable*. This means that we cannot modify its contents. So the other operators that modify a list do not apply to a tuple.