## Review from last time:

A **Data Structure** is a systematic way of organizing and storing data so that it can be accessed, updated, and processed efficiently.
- Same data + different data structure → very different performance
- Choosing a data structure is often more important than choosing a clever algorithm

A **tuple** **Data Structure** in Python is a sequence of objects with two key characteristics:

- The number of objects in the tuple is fixed.
- The objects are immutable, meaning their values cannot be changed.

Tuples are defined as sequences of objects separated by commas and enclosed in parentheses ().

In [9]:
tuple_ex = (2, 4, 2)
#tuple_ex[0]=5  #this doesn't work
print(tuple_ex)
type(tuple_ex)



TypeError: 'tuple' object does not support item assignment

## Practice
Gross Domestic Product (GDP) per capita shows a country's GDP divided by its total population. The tuple GDP in the code cell below contains the USA's GDP per capita data from 1960 to 2021.  The values are arranged chronologically, with the first value corresponding to 1960, the second to 1961, and so forth. Write a program to identify and print the years when the GDP per capita in the US increased by more than 10% compared to the previous year.

In [13]:
GDP = (3007, 3067, 3244, 3375,3574, 3828, 4146, 4336, 4696, 5032,5234,5609,6094,6726,7226,7801,8592,9453,10565,11674,12575,13976,14434,15544,17121,18237,19071,20039,21417,22857,23889,24342,25419,26387,27695,28691,29968,31459,32854,34515,36330,37134,37998,39490,41725,44123,46302,48050,48570,47195,48651,50066,51784,53291,55124,56763,57867,59915,62805,65095,63028,69288)

In [23]:
#Write your solution here
year=1960
for i in range(1,len(GDP)):
    diff=GDP[i]-GDP[i-1]
    if diff/GDP[i-1] > .1:
        print("greater", "Year", year+i)






greater Year 1973
greater Year 1976
greater Year 1977
greater Year 1978
greater Year 1979
greater Year 1981
greater Year 1984


## More on Tuples
Tuples can be concatenated using the + operator:

In [25]:
(2,4,2) + ("a", "tuple") + ("mixing","datatypes is crazy",8)

(2, 4, 2, 'a', 'tuple', 'mixing', 'datatypes is crazy', 8)

Multiplying a tuple by an integer results in repetition of the tuple:

In [27]:
(2,4,2)*3

(2, 4, 2, 2, 4, 2, 2, 4, 2)

When a tuple is assigned to an expression with multiple variables, it is unpacked, and each variable is assigned a value based on the order of the elements in the tuple.

In [33]:
a,b,c  = (2.5, "a string", (("Nested tuple",8)))
print(a)
print(b)
print(c)

2.5
a string
('Nested tuple', 8)



If we only want to retrieve specific values from a tuple while ignoring others, we can use the expression *_ to discard the unwanted values. For example, if we need to extract only the first and the last two values of a tuple:

In [39]:
a,*_,b,c,d  = (2.5, "a string", (("Nested tuple",8)),"98",99)
print(a)
print(b)
print(c)
print(d)

#You try something

2.5
('Nested tuple', 8)
98
99


##  Practice 
Again, use the GDP tuple that contains the USA’s GDP per capita data from 1960 to 2021, with values arranged in ascending order (i.e., the first value corresponds to 1960, the second to 1961, and so on).

Write a function with two parameters:

* Year: Specifies the starting year for the GDP per capita data in the second parameter.
* Tuple of GDP per capita values: A tuple containing the GDP per capita for consecutive years starting from the year provided in the first parameter.

The function should return a tuple of two elements:
* The first element is the count of years where the GDP per capita increased by more than 5%.
* The second element is the most recent year when the GDP per capita increase was more than 5%.

Call the function to determine the number of years and the most recent year where the GDP per capita increased by more than 5% since the year 2000. Store the number of years in a variable called num_years, and the most recent year in a variable called recent_year. Finally, print the values of num_years and recent_year.

In [49]:
#Write your solution here

def perGDP_5p(year, GDP):
    count=0
    recent_year=year
    for i in range(1,len(GDP)):
        diff=GDP[i]-GDP[i-1]
        if diff/GDP[i-1] > .05:
            count+=1
            recent_year=year+i
            #print("greater", "Year", year+i)

    return count, recent_year

    


a,b=perGDP_5p(1960, GDP)
print(a,b)

28 2021


## Tuple methods

Two useful tuple methods are count, which returns the number of times an element appears in the tuple, and index, which gives the position of the first occurrence of an element in the tuple.

In [51]:
tuple_ex = (2,4,2,7,87,4,2,2)
tuple_ex.count(2)  #don't forget tabbing

4

In [53]:
tuple_ex.index(2)

0

# List Data Structure

A **list** is a sequence of Python objects with two key characteristics that distinguish it from a tuple:
1. The number of objects is flexible, meaning items can be added or removed from a list.
2. The objects are mutable, meaning their values can be modified.

A list is defined as a sequence of Python objects separated by commas and enclosed in square brackets `[]`. For example, here’s a list containing three integers:

In [63]:
list_ex = [2,4,2]
print(list_ex)

[2, 4, 2]


### Adding and Removing Elements in a List

We can add elements to the end of a list using the `append` method. For example, we can append the string `'dog'` to the `list_ex` list as shown below:

In [65]:
list_ex.append('dog')
print(list_ex)

[2, 4, 2, 'dog']


Note that the elements of a list or a tuple can be of different data types.

To add an element at a specific position in a list, we can use the `insert` method. For example, to insert the number `2.42` as the second element in the `list_ex`, we can do the following:

In [67]:
list_ex.insert(1,2.42)
print(list_ex)

[2, 2.42, 4, 2, 'dog']


To remove an element from a list, you can use either the `pop` or `remove` method. The `pop` method removes an element at a specific index, while the `remove` method removes the first occurrence of an element by its value. See the examples below:

In [71]:
a=list_ex.pop()  #if no index is given it removes the end value
print(a)
print(list_ex)

b=list_ex.pop(2)
print(b)
print(list_ex)  #the end element and the item at index 2 is removed

2
[2, 2.42, 4]
4
[2, 2.42]


In [73]:
list_ex2 = [2,3,2,4,4]
list_ex2.remove(2)  #removes the first occurrence of the value 2
list_ex2


[3, 2, 4, 4]

**You try:**  filter out the values greater than 100 from the list below

In [77]:
#filter out the values greater than 100

list_ex3 = list(range(95,106))
print(list_ex3)
list_ans=[]
for item in list_ex3:
    if item<=100:
        list_ans.append(item)

print(list_ans)





[95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105]
[95, 96, 97, 98, 99, 100]


## List Comprehension

**List comprehension** is a concise and elegant way to create lists based on existing iterables (like lists, tuples, or ranges). It allows you to generate a new list by applying an expression or condition to each element of an iterable.  These are often used by data analysts so you need to understand how to do these!  

### Syntax

```python
new_list = [expression for item in iterable if condition]
```

- **`expression`**: The value or transformation to apply to each item in the iterable.
- **`item`**: A variable representing each element in the iterable.
- **`iterable`**: The source of elements (e.g., a list, range, or another iterable).
- **`condition`** *(optional)*: A filter to include only items that satisfy the condition.

In [85]:
# Examples
numbers = [1, 2, 3, 4, 5, 6]

# Create a List of Squares using numbers
sq= [ x*x    for x in numbers]

print(sq)

#Filter the evens

evens= [ x  for x in numbers if x%2==0 ]
print(evens)




[1, 4, 9, 16, 25, 36]
[2, 4, 6]


In [99]:
words = ["hello", "world", "python"]
#Transform the strings to uppercase
print(words[0].upper())

uppers= [word.upper()  for word in words]
print(uppers)

firstlet=[word[0].upper()+word[1:]  for word in words]
print(firstlet)




HELLO
['HELLO', 'WORLD', 'PYTHON']
['Hello', 'World', 'Python']


In [101]:
#You can even get crazier and apply multiple conditions
numbers = range(10)
filtered_numbers = [x for x in numbers if x % 2 == 0 and x > 3]
print(filtered_numbers)  # Output is [4, 6, 8]

[4, 6, 8]


In [None]:
#  Nested Loop Example

pairs = [(x, y) for x in [1, 2, 3] for y in [4, 5, 6]]
print(pairs)
# Output is [(1, 4), (1, 5), (1, 6), (2, 4), (2, 5), (2, 6), (3, 4), (3, 5), (3, 6)]

In [105]:
# Create a list of tuples, where each tuple 
#consists of a number and its square, for integers ranging from 5 to 15.

pairs= [(x,x*x) for x in range(5,15)]
print(pairs)



[(5, 25), (6, 36), (7, 49), (8, 64), (9, 81), (10, 100), (11, 121), (12, 144), (13, 169), (14, 196)]


### Why List Comprehension?

1. **Readability**: Makes the code more concise and easier to understand.
2. **Performance**: Often faster than equivalent loops because it is optimized internally.
3. **Compactness**: Reduces the number of lines of code.

In [None]:
# Using a for loop
numbers = [1, 2, 3, 4, 5]
squares = []
for x in numbers:
    squares.append(x ** 2)
print(squares)  # Output is [1, 4, 9, 16, 25]

#Versus using list comprehension
numbers = [1, 2, 3, 4, 5]
squares = [x ** 2 for x in numbers]
print(squares)  # Output is [1, 4, 9, 16, 25]

# List comprehension is syntactic sugar!  It makes your code look SWEET!  


### Practice Problems

Suppose you collected a survey from students at Rhodes and you asked them when they were going to get married.  You produced the following dataset.   Notice that it contains some noisy data with '+', 'Never', '35-40'

In [None]:
marriage_age = ['24', '30', '28', '29', '30', '27', '26', '28', '30+', '26', '28', '30', '30', '30', 'probably never', 
'30', '25', '25', '30', '28', '30+ ', '30', '25', '28', '28', '25', '25', '27', '28', '30', '30', '35', '26', '28', '27', 
'27', '30', '25', '30', '26', '32', '27', '26', '27', '26', '28', '37', '28', '28', '28', '35', '28', '27', '28', '26', 
'28', '26', '30', '27', '30', '28', '25', '26', '28', '35', '29', '27', '27', '30', '24', '25', '29', '27', '33', '30', 
'30', '25', '26', '30', '32', '26', '30', '30', 'I wont', '25', '27', '27', '25', '27', '27', '32', '26', '25', 'never', 
'28', '33', '28', '35', '25', '30', '29', '30', '31', '28', '28', '30', '40', '30', '28', '30', '27', 'by 30', '28', 
'27', '28', '30-35', '35', '30', '30', 'never', '30', '35', '28', '31', '30', '27', '33', '32', '27', '27', '26', 'N/A', 
'25', '26', '29', '28', '34', '26', '24', '28', '30', '120', '25', '33', '27', '28', '32', '30', '26', '30', '30', '28', 
'27', '27', '27', '27', '27', '27', '28', '30', '30', '30', '28', '30', '28', '30', '30', '28', '28', '30', '27', '30', 
'28', '25', 'never', '420', '28', '28', '33', '30', '28', '28', '26', '30', '26', '27', '30', '25', 'Never', '27', '27', 
'25','not', '35-40','23','22']

#filter out all strings that are not integers
#make marriage_age a list of ints





#Cap the values greater than 80 to 80 or make some reasonable deduction here.




#What is the mean age when people expect to marry?



#Determine the percentage of people who expect to marry at an age of 30 or more.
