In [1]:
import pandas as pd

### Intro to Series

Let's make our first series and represent them in the following way:

In [2]:
companies = [
    'Apple', 'Samsung', 'Alphabet', 'Foxconn',
    'Microsoft', 'Huawei', 'Dell Technologies',
    'Meta', 'Sony', 'Hitachi', 'Intel',
    'IBM', 'Tencent', 'Panasonic'
]

In [3]:
s = pd.Series([
    274515, 200734, 182527, 181945, 143015,
    129184, 92224, 85965, 84893, 82345,
    77867, 73620, 69864, 63191],
    index=companies,
    name="Top Technology Companies by Revenue")

In [4]:
s

Apple                274515
Samsung              200734
Alphabet             182527
Foxconn              181945
Microsoft            143015
Huawei               129184
Dell Technologies     92224
Meta                  85965
Sony                  84893
Hitachi               82345
Intel                 77867
IBM                   73620
Tencent               69864
Panasonic             63191
Name: Top Technology Companies by Revenue, dtype: int64

##### 1. Check your knowledge: build a series

Create a series called `my_series`

In [5]:
my_series = pd.Series([9,11,-5],['a','b','c'],name='My First Series')
my_series

a     9
b    11
c    -5
Name: My First Series, dtype: int64

### Basic selection and location

#### Selecting by index:

In [6]:
s['Apple']

274515

`.loc` is the preferred way:

In [7]:
s.loc['Apple']

274515

#### Selection by position:

In [8]:
s.iloc[0]

274515

In [9]:
s.iloc[-1]

63191

#### Errors in selection:

In [10]:
# this code will fail
s.loc["Non existent company"]

KeyError: 'Non existent company'

In [11]:
# This code also fails, 132 it's out of boundaries
# (there are not so many elements in the Series)
s.iloc[132]

IndexError: single positional indexer is out-of-bounds

We could prevent these errors using the membership check `in`:

In [12]:
"Apple" in s

True

In [13]:
"Snapchat" in s

False

#### Multiple selection:

By index:

In [14]:
s[['Apple', 'Intel', 'Sony']]

Apple    274515
Intel     77867
Sony      84893
Name: Top Technology Companies by Revenue, dtype: int64

By position:

In [15]:
s.iloc[[0, 5, -1]]

Apple        274515
Huawei       129184
Panasonic     63191
Name: Top Technology Companies by Revenue, dtype: int64

#### Activities:

##### 2. Check your knowledge: location by index

Select the revenue of `Intel` and store it in a variable named `intel_revenue`:

In [16]:
intel_revenue = s.loc['Intel']

##### 3. Check your knowledge: location by position

Select the revenue of the "second to last" element in our series `s` and store it in a variable named `second_to_last`:

In [17]:
second_to_last = s.iloc[-2]

##### 4. Check your knowledge: multiple selection

Use multiple label selection to retrieve the revenues of the companies:

* Samsung
* Dell Technologies
* Panasonic
* Microsoft

In [18]:
sub_series = s.loc[['Samsung','Dell Technologies','Panasonic','Microsoft']]
sub_series

Samsung              200734
Dell Technologies     92224
Panasonic             63191
Microsoft            143015
Name: Top Technology Companies by Revenue, dtype: int64

### Series Attributes and Methods

In [19]:
s.head()

Apple        274515
Samsung      200734
Alphabet     182527
Foxconn      181945
Microsoft    143015
Name: Top Technology Companies by Revenue, dtype: int64

In [20]:
s.tail()

Hitachi      82345
Intel        77867
IBM          73620
Tencent      69864
Panasonic    63191
Name: Top Technology Companies by Revenue, dtype: int64

#### Main Attributes

The underlying data:

In [21]:
s.values

array([274515, 200734, 182527, 181945, 143015, 129184,  92224,  85965,
        84893,  82345,  77867,  73620,  69864,  63191])

The index:

In [22]:
s.index

Index(['Apple', 'Samsung', 'Alphabet', 'Foxconn', 'Microsoft', 'Huawei',
       'Dell Technologies', 'Meta', 'Sony', 'Hitachi', 'Intel', 'IBM',
       'Tencent', 'Panasonic'],
      dtype='object')

The name (if any):

In [23]:
s.name

'Top Technology Companies by Revenue'

The type associated with the values:

In [24]:
s.dtype

dtype('int64')

The size of the Series:

In [25]:
s.size

14

`len` also works:

In [26]:
len(s)

14

#### Statistical methods

In [27]:
s.describe()

count        14.000000
mean     124420.642857
std       63686.481231
min       63191.000000
25%       78986.500000
50%       89094.500000
75%      172212.500000
max      274515.000000
Name: Top Technology Companies by Revenue, dtype: float64

In [28]:
s.mean()

124420.64285714286

In [29]:
s.median()

89094.5

In [30]:
s.std()

63686.48123135607

In [31]:
s.min(), s.max()

(63191, 274515)

In [32]:
s.quantile(.75)

172212.5

In [33]:
s.quantile(.99)

264923.47

### Activities

In [34]:
# Run this cell to complete the activity
american_companies = s[[
    'Meta', 'IBM', 'Microsoft',
    'Dell Technologies', 'Apple', 'Intel', 'Alphabet'
]]
american_companies

Meta                  85965
IBM                   73620
Microsoft            143015
Dell Technologies     92224
Apple                274515
Intel                 77867
Alphabet             182527
Name: Top Technology Companies by Revenue, dtype: int64

##### 5. What's the average revenue of American Companies?

In [35]:
american_companies.mean()

132819.0

##### 6. What's the median revenue of American Companies?

In [36]:
american_companies.median()

92224.0

### Sorting Series

#### Sorting by values or Index

Sorting by values, notice it's in "ascending mode":

In [37]:
s.sort_values()

Panasonic             63191
Tencent               69864
IBM                   73620
Intel                 77867
Hitachi               82345
Sony                  84893
Meta                  85965
Dell Technologies     92224
Huawei               129184
Microsoft            143015
Foxconn              181945
Alphabet             182527
Samsung              200734
Apple                274515
Name: Top Technology Companies by Revenue, dtype: int64

Sorting by index (lexicographically by company's name), notice it's in ascending mode:

In [38]:
s.sort_index()

Alphabet             182527
Apple                274515
Dell Technologies     92224
Foxconn              181945
Hitachi               82345
Huawei               129184
IBM                   73620
Intel                 77867
Meta                  85965
Microsoft            143015
Panasonic             63191
Samsung              200734
Sony                  84893
Tencent               69864
Name: Top Technology Companies by Revenue, dtype: int64

To sort in descending mode:

In [39]:
s.sort_values(ascending=False).head()

Apple        274515
Samsung      200734
Alphabet     182527
Foxconn      181945
Microsoft    143015
Name: Top Technology Companies by Revenue, dtype: int64

In [40]:
s.sort_index(ascending=False).head()

Tencent       69864
Sony          84893
Samsung      200734
Panasonic     63191
Microsoft    143015
Name: Top Technology Companies by Revenue, dtype: int64

### Activities

##### 7. What company has the largest revenue?

In [41]:
s.sort_values(ascending=False).head(1)

Apple    274515
Name: Top Technology Companies by Revenue, dtype: int64

##### 8. Sort company names lexicographically. Which one comes first?

In [42]:
s.sort_index()

Alphabet             182527
Apple                274515
Dell Technologies     92224
Foxconn              181945
Hitachi               82345
Huawei               129184
IBM                   73620
Intel                 77867
Meta                  85965
Microsoft            143015
Panasonic             63191
Samsung              200734
Sony                  84893
Tencent               69864
Name: Top Technology Companies by Revenue, dtype: int64

### Immutability

Run the sort methods above and check the series again, you'll see that `s` has NOT changed:

In [43]:
s.head()

Apple        274515
Samsung      200734
Alphabet     182527
Foxconn      181945
Microsoft    143015
Name: Top Technology Companies by Revenue, dtype: int64

We will sort the series by revenue, ascending, and we'll mutate the original one. Notice how the method doesn't return anything:

In [44]:
s.sort_values(inplace=True)

But now the series is sorted by revenue in ascending order:

In [45]:
s.head()

Panasonic    63191
Tencent      69864
IBM          73620
Intel        77867
Hitachi      82345
Name: Top Technology Companies by Revenue, dtype: int64

We'll now sort the series by index, mutating it again:

In [46]:
s.sort_index(inplace=True)

In [47]:
s.head()

Alphabet             182527
Apple                274515
Dell Technologies     92224
Foxconn              181945
Hitachi               82345
Name: Top Technology Companies by Revenue, dtype: int64

### Activities

##### 9. Sort American Companies by Revenue

In [48]:
american_companies_desc = american_companies.sort_values(ascending=False)

##### 10. Sort (and mutate) international companies

In [49]:
# Run this cell to complete the activity
international_companies = s[[
    "Sony", "Tencent", "Panasonic",
    "Samsung", "Hitachi", "Foxconn", "Huawei"
]]
international_companies

Sony          84893
Tencent       69864
Panasonic     63191
Samsung      200734
Hitachi       82345
Foxconn      181945
Huawei       129184
Name: Top Technology Companies by Revenue, dtype: int64

In [50]:
international_companies_dec=international_companies.sort_values(inplace=True,ascending=False)

### Modifying series

Modifying values:

In [51]:
s['IBM']  = 0

In [52]:
s.sort_values().head()

IBM              0
Panasonic    63191
Tencent      69864
Intel        77867
Hitachi      82345
Name: Top Technology Companies by Revenue, dtype: int64

Adding elements:

In [53]:
s['Tesla'] = 21450

In [54]:
s.sort_values().head()

IBM              0
Tesla        21450
Panasonic    63191
Tencent      69864
Intel        77867
Name: Top Technology Companies by Revenue, dtype: int64

Removing elements:

In [55]:
del s['Tesla']

In [56]:
s.sort_values().head()

IBM              0
Panasonic    63191
Tencent      69864
Intel        77867
Hitachi      82345
Name: Top Technology Companies by Revenue, dtype: int64

### Activities

##### 11. Insert Amazon's Revenue

In [57]:
s['Amazon']=469822

##### 12. Delete the revenue of Meta

In [58]:
del s['Meta']

### Concatenating Series

We can append series to other series using the `.concat()` method:

In [59]:
another_s = pd.Series([21_450, 4_120], index=['Tesla', 'Snapchat'])

In [60]:
another_s

Tesla       21450
Snapchat     4120
dtype: int64

In [61]:
s_new = pd.concat([s, another_s])

The original series `s` is not modified:

In [62]:
s

Alphabet             182527
Apple                274515
Dell Technologies     92224
Foxconn              181945
Hitachi               82345
Huawei               129184
IBM                       0
Intel                 77867
Microsoft            143015
Panasonic             63191
Samsung              200734
Sony                  84893
Tencent               69864
Amazon               469822
Name: Top Technology Companies by Revenue, dtype: int64

`s_new` is the concatenation of `s` and `s2`:

In [63]:
s_new

Alphabet             182527
Apple                274515
Dell Technologies     92224
Foxconn              181945
Hitachi               82345
Huawei               129184
IBM                       0
Intel                 77867
Microsoft            143015
Panasonic             63191
Samsung              200734
Sony                  84893
Tencent               69864
Amazon               469822
Tesla                 21450
Snapchat               4120
dtype: int64

### The End!