# Exercises: Series and indexes in pandas
 
1. Create a series containing 10 random integers, with index a-j.
2. Retrieve the item at index b.
3. Retrieve the items at indexes c, d, and f
4. What is the mean of the items at indexes a, e, g, and h?
5. What is the mean of the items with even numeric (positional) indexes?

In [9]:
import pandas as pd
import numpy as np

data = np.random.randint(low=0, high=100, size=10)
# index = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
series = pd.Series(data=data, index=list('abcdefghij'))
series

a    98
b    22
c     6
d    69
e    85
f     7
g    29
h     7
i    55
j    30
dtype: int32

In [4]:
series.loc['b'] # Retrieve the item at index b.


33

In [11]:
# series.loc[list('cdf')] # Retrieve the items at indexes c, d, and f
series.loc[['c','d','f']]

c     6
d    69
f     7
dtype: int32

In [6]:
series.loc[['a', 'e', 'g', 'h']].mean() #  What is the mean of the items at indexes a, e, g, and h?

56.25

In [8]:
series.iloc[::2].mean() #  What is the mean of the items with even numeric (positional) indexes?

52.8

# Exercises: Series and indexes
 
1. Create a new series, with 5 random integers from 0-100, and an index of a-e.
2. Create another new series, with 5 random integers from 0-100, and an index of b-f.
3. What happens when you add these together? Do so both getting `nan` and with a default of 0.
4. What happens when you multiply them together? Do so both getting `nan` and a default of 1. (Why not 0?)
5. Add together only b-c-d from each of these series.

In [16]:

# 1. Create a new series, with 5 random integers from 0-100, and an index of a-e.
s1 = pd.Series(np.random.randint(low=0, high=100, size=5), index=list('abcde'))
print(f"series1:\n{s1}")

# 2. Create another new series, with 5 random integers from 0-100, and an index of b-f.
s2 = pd.Series(np.random.randint(low=0, high=100, size=5), index=list('bcdef'))
print(f"series2:\n{s2}")

# 3. What happens when you add these together? Do so both getting `nan` and with a default of 0.
add_nans = s1 + s2
print(f"series w nan:\n{add_nans}")
add_zeros = s1.add(s2, fill_value=0)
print(f"series w/o nan:\n{add_zeros}")

# 4. What happens when you multiply them together? Do so both getting `nan` and a default of 1. (Why not 0?)
multiply_nans = s1 * s2
print(f"series * w nan:\n{multiply_nans}")
multiply_ones = s1.multiply(s2, fill_value=1)
print(f"series * w/o nan:\n{multiply_ones}")

# 5. Add together only b-c-d from each of these series.
sub_add = s1[['b', 'c', 'd']] + s2['b':'d']
print(sub_add)

series1:
a    18
b    30
c    26
d    28
e    12
dtype: int32
series2:
b    95
c    67
d    46
e    33
f    66
dtype: int32
series w nan:
a      NaN
b    125.0
c     93.0
d     74.0
e     45.0
f      NaN
dtype: float64
series w/o nan:
a     18.0
b    125.0
c     93.0
d     74.0
e     45.0
f     66.0
dtype: float64
series * w nan:
a       NaN
b    2850.0
c    1742.0
d    1288.0
e     396.0
f       NaN
dtype: float64
series * w/o nan:
a      18.0
b    2850.0
c    1742.0
d    1288.0
e     396.0
f      66.0
dtype: float64
b    125
c     93
d     74
dtype: int32


# Is a pandas dataframe the same as a panda series?

~ No, a Pandas DataFrame is a two-dimensional table whereas a Pandas Series is a one-dimensional labeled array, similar to a column in a DataFrame. A Pandas Series can be thought of a single column of a DataFrame.

# Exercises: Series and indexes
1. Create a series containing 10 random floats from 1-1,000.
2. Replace the numbers whose integer part is even with NaN.
3. Replace those NaN values with the mean of the remaining numbers.
4. What's the mean now?

place all results in a single script using fstring to display the results clearly. 

In [44]:
import numpy as np
import pandas as pd

# 1. Create a series containing 10 random floats from 1-1,000.
np.random.seed(7)
s = pd.Series(np.random.rand(10) * 1000)
print(f"1. Series of 10 random floats from 1-1,000:\n{s}")

1. Series of 10 random floats from 1-1,000:
0     76.308289
1    779.918792
2    438.409231
3    723.465178
4    977.989512
5    538.495870
6    501.120464
7     72.051133
8    268.438980
9    499.882501
dtype: float64


In [43]:
# 2. Replace the numbers whose integer part is even with NaN.
s.loc[s.apply(lambda x: int(x) % 2 == 0)] = np.nan
# s.loc[s.astype(np.int64) % 2 == 0]
print(f"\n2. Series with even integer part replaced with NaN:\n{s}")

# 3. Replace those NaN values with the mean of the remaining numbers.
mean = s.mean(skipna=True)
s = s.fillna(mean)
print(f"\n3. Series with NaN values replaced with mean: \n{s}")

# 4. What's the mean now?
new_mean = s.mean(skipna=True)
print(f"\n4. The mean of the series now is {new_mean:.2f}")

ValueError: cannot convert float NaN to integer

In [49]:
%%timeit
import numpy as np
import pandas as pd

# 1. Create a series containing 10 random floats from 1-1,000.
np.random.seed(7)
s = pd.Series(np.random.rand(10) * 1000)
# print(f"1. Series of 10 random floats from 1-1,000:\n{s}")

s.loc[s.apply(lambda x: int(x) % 2 == 0)] = np.nan
# print(s)

3.18 ms ± 528 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [48]:
%%timeit
import numpy as np
import pandas as pd

# 1. Create a series containing 10 random floats from 1-1,000.
np.random.seed(7)
s = pd.Series(np.random.rand(10) * 1000)
# print(f"1. Series of 10 random floats from 1-1,000:\n{s}")

s.loc[s.astype(np.int64) % 2 == 0] = np.nan
# print(s)


2.99 ms ± 315 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


# Exercise: Weather data using panda's series and sort
 
1. Create a series with the projected high temps for your location over the next 10 days. The index should contain 4-character strings with the month (two digits) and the date (two digits). The values should be the temps.
2. Find descriptive statistics for the upcoming high temps.
3. Find the 3 highest temperatures that are expected.
4. Find the high temps starting on Monday and going through Wednesday.

In [56]:
import pandas as pd

# 1. Create series of projected high temps for next 10 days
temps = pd.Series([85, 87, 90, 89, 88, 92, 91, 85, 84, 82],
                  index=['05-01','05-02','05-03','05-04','05-05','05-06',
                         '05-07','05-08','05-09','05-10'])

# 2. Descriptive statistics
print(f"Descriptive statistics for the upcoming high temperatures:\n{temps.describe()}\n")

# 3. 3 highest temperatures expected
highest_temps = temps.sort_values(ascending=False)[:3]
print(f"The 3 highest temperatures expected are:\n{highest_temps}\n")

# 4. High temps starting on Monday to Wednesday
mon_wed_temps = temps['05-07':'05-09']
print(f"The high temperatures from Monday to Wednesday are:\n{mon_wed_temps}")

Descriptive statistics for the upcoming high temperatures:
count    10.000000
mean     87.300000
std       3.267687
min      82.000000
25%      85.000000
50%      87.500000
75%      89.750000
max      92.000000
dtype: float64

The 3 highest temperatures expected are:
05-06    92
05-07    91
05-03    90
dtype: int64

The high temperatures from Monday to Wednesday are:
05-07    91
05-08    85
05-09    84
dtype: int64


# Does method chaining cause python to be less efficient? 
No, method chaining does not necessarily cause Python to be less efficient. In fact, some tasks can be accomplished more efficiently with method chaining. However, using too many method chains may make the code less readable and harder to debug, which can result in a loss of efficiency in terms of development time and maintenance. Ultimately, it depends on the specific task and how the code is written.

# Exercises pandas series methods
 
1. Create a series containing 10 words (strings) of varying lengths.
2. Find all of the strings whose length is less than the mean length.
3. Find all of the strings whose length is odd.
4. Find the strings containing either 'a' or 'e'.

In [60]:
words = pd.Series(['hello', 'world', 'pandas', 'training', 'with', 'python', 'series', 'reuven', 'timothy', 'newport'])

mean_length = words.str.len().mean()
short_words = words.loc[words.str.len() < mean_length]

odd_length_words = words.loc[words.str.len() % 2 != 0]

ae_words = words.loc[words.str.contains('[ae]')]

print(f"Words: \n{words}\n")
print(f"Short words (length less than mean): \n{short_words}\n")
print(f"Words with odd lengths: \n{odd_length_words}\n")
print(f"Words containing 'a' or 'e': \n{ae_words}")

Words: 
0       hello
1       world
2      pandas
3    training
4        with
5      python
6      series
7      reuven
8     timothy
9     newport
dtype: object

Short words (length less than mean): 
0    hello
1    world
4     with
dtype: object

Words with odd lengths: 
0      hello
1      world
8    timothy
9    newport
dtype: object

Words containing 'a' or 'e': 
0       hello
2      pandas
3    training
6      series
7      reuven
9     newport
dtype: object
