<a target="_blank" href="https://colab.research.google.com/github/lukebarousse/Python_Data_Analytics_Course/blob/main/1_Basics/14_List_Comprehensions.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

# List Comprehensions

## Notes

* A way way to create a new list (with shorter syntax) based on the values of an existing list.

Not limited to only `list` comprehension:
- `set` comprehension
- `tuple` comprehension
- `dictionary` comprehension

## Importance

Provide a concise way to create lists. Useful for data manipulation and filtering in pandas.

In [None]:
# Creating a list of numbers from 0 to 9
numbers = [x for x in range(10)]
numbers

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

## Example # 1

We're going to modify our example that we used in our `for` loop. Intsead of having the whole print statement with "Position requires X years of experience". We are just going to print out the experience required. This is a simplified version of our code earlier.

In [None]:
# Minimum experience required for job positions
position_experience_requirements = [1, 2, 3]

# Iterate over each experience requirement in the list of job positions
for x in position_experience_requirements:
    print(x)

1
2
3


Now let's use list comprehension to shorten this.

- The code defines `position_experience_requirements` as a list of integers representing minimum years of experience required for various job positions.
- The for loop goes through each list item in `postion_experience_requirements` and prints out the `requirement`.

In [None]:
# Create a list of job positions
experience = [x for x in position_experience_requirements]

# The result will be a list of job positions
experience

[1, 2, 3]

This is pretty basic. So let's make it a bit more useful. I'm going to add in a variable that stores the user's years of experience.

In [None]:
user_experience = 2
user_experience

2

Now, we are adding an if condition to our list comprehension. This condition checks if the user's experience (`user_experience`) is greater than or equal to each item (`x`) in the `position_experience_requirements` list.

```python
if user_experience >= x
```

It returns only the jobs where the requirement is met or is lower than the user's experience.

In [None]:
# Create a list of job positions for which the user is qualified

qualified_positions= [x for x in position_experience_requirements if user_experience>= x]

qualified_positions

[1, 2]

## Example # 2

This first code block extracts the data we need for this exercise; we'll dive into this later in the course.

For now just understand I'm extracting the list of `job_titles` form our dataset.

In [None]:
from datasets import load_dataset

# Load the dataset
dataset = load_dataset('lukebarousse/data_jobs')
df = dataset['train'].to_pandas()

# Create a list of job titles from the dataset
job_list = df['job_title'].tolist()

# Remove any non-string values from the list
job_list = [job for job in job_list if isinstance(job, str)]

Let's modify our previous `for` loop into a list comp!

In [None]:
# previous for loop
analyst_list = []

for job in job_list:
  if "Data Analyst" in job:
    analyst_list.append(job)

# show first 10 values
analyst_list[:10]

['Technical Data Analyst',
 'Sr. Data Analyst - Full-time / Part-time',
 'Data Analyst',
 'Data Analyst',
 'Data Analyst Junior settore Logistica',
 'Senior Data Analyst - Now Hiring',
 'Health Technology Data Analyst',
 'Data Analyst',
 'Love Excel? Junior Data Analyst for Real Estate',
 'Data Analyst']

However that was 4 lines of code!

With list comprehension we can do it in only 1.

In [None]:
analyst_list = [job for job in job_list if "Data Analyst" in job]

# show first 10 values
analyst_list[:10]

['Technical Data Analyst',
 'Sr. Data Analyst - Full-time / Part-time',
 'Data Analyst',
 'Data Analyst',
 'Data Analyst Junior settore Logistica',
 'Senior Data Analyst - Now Hiring',
 'Health Technology Data Analyst',
 'Data Analyst',
 'Love Excel? Junior Data Analyst for Real Estate',
 'Data Analyst']

In [None]:
print("Job list is:     " , len(job_list), "jobs")
print("Analyst list is: ", len(analyst_list), "jobs")

Job list is:      787685 jobs
Analyst list is:  163124 jobs


In [54]:
keys = ['a', 'b', 'c']
values = [1, 2, 3]
dictionary= { key : value  for key , value in zip(keys,values)}
print(dictionary)

{'a': 1, 'b': 2, 'c': 3}


In [55]:
def is_prime(num):
    return num > 1 and all(num % i != 0 for i in range(2, int(num ** 0.5) + 1))

limit = 100
primes = [num for num in range(2, limit) if is_prime(num)]
print(primes)

[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97]


In [56]:
input_string = "The quick brown fox jumps over the lazy dog."
char_frequency = {char: input_string.count(char) for char in set(input_string) if char != ' '}
print(char_frequency)

{'q': 1, 'i': 1, 'y': 1, 's': 1, 'z': 1, 'p': 1, 'c': 1, 'g': 1, 'm': 1, 'a': 1, 'w': 1, 'l': 1, '.': 1, 'h': 2, 'd': 1, 'r': 2, 'b': 1, 'u': 2, 'e': 3, 'x': 1, 't': 1, 'n': 1, 'v': 1, 'o': 4, 'k': 1, 'j': 1, 'T': 1, 'f': 1}


In [57]:
colors = ['red', 'green', 'blue']
sizes = ['S', 'M', 'L']
cartesian_product = [(color, size) for color in colors for size in sizes]
print(cartesian_product)

[('red', 'S'), ('red', 'M'), ('red', 'L'), ('green', 'S'), ('green', 'M'), ('green', 'L'), ('blue', 'S'), ('blue', 'M'), ('blue', 'L')]


In [59]:
words = ["listen", "enlist", "rat", "tar", "art", "evil", "vile", "silent", "lives", "lives"]
anagrams = {}
for word in words:
    key = tuple(sorted(word))
    if key not in anagrams:
        anagrams[key] = []
    anagrams[key].append(word)
print(anagrams)
# Using list comprehension to organize output
anagram_groups = {key: value for key, value in anagrams.items() if len(value) > 1}
print(anagram_groups)

{('e', 'i', 'l', 'n', 's', 't'): ['listen', 'enlist', 'silent'], ('a', 'r', 't'): ['rat', 'tar', 'art'], ('e', 'i', 'l', 'v'): ['evil', 'vile'], ('e', 'i', 'l', 's', 'v'): ['lives', 'lives']}
{('e', 'i', 'l', 'n', 's', 't'): ['listen', 'enlist', 'silent'], ('a', 'r', 't'): ['rat', 'tar', 'art'], ('e', 'i', 'l', 'v'): ['evil', 'vile'], ('e', 'i', 'l', 's', 'v'): ['lives', 'lives']}


In [60]:
nested_dict= {
    'a':[1,2,3],
    'b':[4,5],
    'c':[6,7,8]
}
flattened_list=[i for j in nested_dict.values() for i in j ]
print(flattened_list)

[1, 2, 3, 4, 5, 6, 7, 8]


In [61]:
sentences = [
    "Hello World",
    "This is an example sentence.",
    "Another sentence to count vowels.",
]

vowel_counts = {sentence: sum(sentence.lower().count(vowel) for vowel in 'aeiou') for sentence in sentences}
print(vowel_counts)

{'Hello World': 3, 'This is an example sentence.': 9, 'Another sentence to count vowels.': 11}


In [62]:
words = ["level", "hello", "radar", "python", "world", "deified"]
palindromes = [word for word in words if word == word[::-1]]
print(palindromes)

['level', 'radar', 'deified']


In [63]:
words = ["apple", "banana", "pear", "kiwi", "cherry", "blueberry"]
grouped_by_length = {length: [word for word in words if len(word) == length] for length in set(len(word) for word in words)}
print(grouped_by_length)

{9: ['blueberry'], 4: ['pear', 'kiwi'], 5: ['apple'], 6: ['banana', 'cherry']}


In [64]:
input_string = "The quick brown fox jumps over the lazy dog."
char_frequency = {char: input_string.count(char) for char in set(input_string) if char != ' '}
print(char_frequency)

{'q': 1, 'i': 1, 'y': 1, 's': 1, 'z': 1, 'p': 1, 'c': 1, 'g': 1, 'm': 1, 'a': 1, 'w': 1, 'l': 1, '.': 1, 'h': 2, 'd': 1, 'r': 2, 'b': 1, 'u': 2, 'e': 3, 'x': 1, 't': 1, 'n': 1, 'v': 1, 'o': 4, 'k': 1, 'j': 1, 'T': 1, 'f': 1}
