# Panda Series Intro
Author: Joe Acosta
Date: 07/21/2025

Covers the basic info I've learned about series and offers a few questions and answers

In [1]:
# imports
import pandas as pd

In [2]:
revenue = [
    274515, 200734, 182527, 181945,
    143015, 129184, 92224, 85965, 84893,
    82345, 77867, 73620, 69864, 63191
]

companies = [
    'Apple', 'Samsung', 'Alphabet', 'Foxconn',
    'Microsoft', 'Huawei', 'Dell Technologies',
    'Meta', 'Sony', 'Hitachi', 'Intel',
    'IBM', 'Tencent', 'Panasonic'
]

Create a series without assigned Index for revenue

In [3]:
pd.Series(revenue)

0     274515
1     200734
2     182527
3     181945
4     143015
5     129184
6      92224
7      85965
8      84893
9      82345
10     77867
11     73620
12     69864
13     63191
dtype: int64

Create a series with an index and a title. Store it and print it

In [4]:
# Data: the most important component in the series
# index: the labels given to each index in the series
# name: a title that's given to the series

# SERIES ARE STRONGLY TYPED >> ALL ELEMENTS MUST CONFORM TO A DTYPE
s = pd.Series(data=revenue, index=companies, name="Top Technology Companies by Revenue")
s

Apple                274515
Samsung              200734
Alphabet             182527
Foxconn              181945
Microsoft            143015
Huawei               129184
Dell Technologies     92224
Meta                  85965
Sony                  84893
Hitachi               82345
Intel                 77867
IBM                   73620
Tencent               69864
Panasonic             63191
Name: Top Technology Companies by Revenue, dtype: int64

Check the series elements datatype

In [5]:
s.dtype

dtype('int64')

In [6]:
# Data can be retrieved by index or by position.

Get the data for Apple

In [7]:
# Get Data by index example
s['Apple']

np.int64(274515)

In [8]:
# Preferred way though both work
s.loc['Apple']

np.int64(274515)

Get the data for Apple and IBM

In [9]:
s.loc[['Apple', 'IBM']]

Apple    274515
IBM       73620
Name: Top Technology Companies by Revenue, dtype: int64

Get the data stored at position 3

In [10]:
s.iloc[3] # or s[3]

np.int64(181945)

Get the Data for multiple positions >> 1 to 5

In [11]:
s.iloc[1:5]

Samsung      200734
Alphabet     182527
Foxconn      181945
Microsoft    143015
Name: Top Technology Companies by Revenue, dtype: int64

Get the data for the positions 1, 4, -3

In [12]:
s.iloc[[1, 4, -3]]

Samsung      200734
Microsoft    143015
IBM           73620
Name: Top Technology Companies by Revenue, dtype: int64

Get the last element in the Series

In [13]:
s.iloc[-1]

np.int64(63191)

Is Snapchat in the list of companies?

In [14]:
'Snapchat' in s

False

Get the name of the series

In [15]:
s.name

'Top Technology Companies by Revenue'

Get the index of the seres

In [16]:
s.index

Index(['Apple', 'Samsung', 'Alphabet', 'Foxconn', 'Microsoft', 'Huawei',
       'Dell Technologies', 'Meta', 'Sony', 'Hitachi', 'Intel', 'IBM',
       'Tencent', 'Panasonic'],
      dtype='object')

Get the values in the series

In [17]:
s.values

array([274515, 200734, 182527, 181945, 143015, 129184,  92224,  85965,
        84893,  82345,  77867,  73620,  69864,  63191])

Get all the basic quantitative information for the series

In [18]:
s.describe()

count        14.000000
mean     124420.642857
std       63686.481231
min       63191.000000
25%       78986.500000
50%       89094.500000
75%      172212.500000
max      274515.000000
Name: Top Technology Companies by Revenue, dtype: float64

Get the maximum, minimum, and mean for the data

In [19]:
print(s.max())
print(s.min())
print(s.mean())

274515
63191
124420.64285714286


What is the 95% percentile for revenue of these companies

In [20]:
s.quantile(.95)

np.float64(226557.34999999998)

What is the lowest ten percent for revenue of these companies

In [21]:
s.quantile(.10)

np.float64(70990.8)

How many companies are recorded in the series?

In [22]:
s.size # Alternatively len(s)

14

Create sub-series:

In [23]:
international_companies= [
    "Sony", "Tencent", "Panasonic",
    "Samsung", "Hitachi", "Foxconn", "Huawei"
]

domestic_companies= ['Meta', 'IBM', 'Microsoft',
    'Dell Technologies', 'Apple', 'Intel', 'Alphabet'
]

In [24]:
american_companies = s.loc[domestic_companies]
american_companies


Meta                  85965
IBM                   73620
Microsoft            143015
Dell Technologies     92224
Apple                274515
Intel                 77867
Alphabet             182527
Name: Top Technology Companies by Revenue, dtype: int64

In [25]:
international_s=s.loc[international_companies]
international_s

Sony          84893
Tencent       69864
Panasonic     63191
Samsung      200734
Hitachi       82345
Foxconn      181945
Huawei       129184
Name: Top Technology Companies by Revenue, dtype: int64

Is Samsung an american company?

In [26]:
'Samsung' in american_companies

False

Sort the american companies by reverse alphabetically

In [27]:
american_companies.sort_index(ascending=False)

Microsoft            143015
Meta                  85965
Intel                 77867
IBM                   73620
Dell Technologies     92224
Apple                274515
Alphabet             182527
Name: Top Technology Companies by Revenue, dtype: int64

Sort the american companies by highest revenue

In [28]:
# since series are a 1d ordered sequence, don't need the by attribute like when sort_values() is used on a dataframe
american_companies.sort_values(ascending=False)

Apple                274515
Alphabet             182527
Microsoft            143015
Dell Technologies     92224
Meta                  85965
Intel                 77867
IBM                   73620
Name: Top Technology Companies by Revenue, dtype: int64

NOTE THAT SERIES ARE IMMUTABLE

This is why the sort functions don't affect the original series

Add Tesla to the Series

In [29]:
Tesla = 21450

In [30]:
s['Tesla'] = Tesla

In [31]:
s.loc['Tesla']

np.int64(21450)

In [32]:
'Tesla' in s

True

Set Tesla Revenue to zero

In [33]:
s['Tesla'] = 0

Remove Tesla from the series and check that it was removed

In [34]:
del s['Tesla']

In [35]:
'Tesla' in s

False

Create a new series with Tesla and Snapchat

In [36]:
ts_revenue=[21450, 4120]
ts_names=['Tesla', 'Snapchat']

In [37]:
ts=pd.Series(data=ts_revenue, index=ts_names, name='Tesla and Snapchat')
ts

Tesla       21450
Snapchat     4120
Name: Tesla and Snapchat, dtype: int64

Add the TS series to the original series

In [38]:
merged=pd.concat([s,ts])
merged

Apple                274515
Samsung              200734
Alphabet             182527
Foxconn              181945
Microsoft            143015
Huawei               129184
Dell Technologies     92224
Meta                  85965
Sony                  84893
Hitachi               82345
Intel                 77867
IBM                   73620
Tencent               69864
Panasonic             63191
Tesla                 21450
Snapchat               4120
dtype: int64

Create a rev and com series using the revenue and company data

In [39]:
rev=pd.Series(data=revenue)
com=pd.Series(data=companies)

Order the company series based on the order generated from the revenue series. This is based on the revenue in ascending order

In [40]:
sorted_rev=rev.sort_values()
com.iloc[sorted_rev.index]

13            Panasonic
12              Tencent
11                  IBM
10                Intel
9               Hitachi
8                  Sony
7                  Meta
6     Dell Technologies
5                Huawei
4             Microsoft
3               Foxconn
2              Alphabet
1               Samsung
0                 Apple
dtype: object

What was the total revenue for the foreign companies

In [41]:
international_s.sum()

np.int64(812156)

What percent of gdp was produced by each company?

In [42]:
american_companies / american_companies.sum()

Meta                 0.092462
IBM                  0.079184
Microsoft            0.153824
Dell Technologies    0.099194
Apple                0.295262
Intel                0.083752
Alphabet             0.196322
Name: Top Technology Companies by Revenue, dtype: float64

Rank each company based on their revenue

In [43]:
s.rank()

Apple                14.0
Samsung              13.0
Alphabet             12.0
Foxconn              11.0
Microsoft            10.0
Huawei                9.0
Dell Technologies     8.0
Meta                  7.0
Sony                  6.0
Hitachi               5.0
Intel                 4.0
IBM                   3.0
Tencent               2.0
Panasonic             1.0
Name: Top Technology Companies by Revenue, dtype: float64

What's the ID for the company with the highest revenue

In [44]:
s.idxmax()

'Apple'

Calculate each companies revenue if they each got an extra 20%

In [45]:
s * 1.2

Apple                329418.0
Samsung              240880.8
Alphabet             219032.4
Foxconn              218334.0
Microsoft            171618.0
Huawei               155020.8
Dell Technologies    110668.8
Meta                 103158.0
Sony                 101871.6
Hitachi               98814.0
Intel                 93440.4
IBM                   88344.0
Tencent               83836.8
Panasonic             75829.2
Name: Top Technology Companies by Revenue, dtype: float64

Add 300 to each companies revenue

In [46]:
s.add(300)

Apple                274815
Samsung              201034
Alphabet             182827
Foxconn              182245
Microsoft            143315
Huawei               129484
Dell Technologies     92524
Meta                  86265
Sony                  85193
Hitachi               82645
Intel                 78167
IBM                   73920
Tencent               70164
Panasonic             63491
Name: Top Technology Companies by Revenue, dtype: int64

Subtract 500 fro each companies revenue

In [47]:
s.sub(500)

Apple                274015
Samsung              200234
Alphabet             182027
Foxconn              181445
Microsoft            142515
Huawei               128684
Dell Technologies     91724
Meta                  85465
Sony                  84393
Hitachi               81845
Intel                 77367
IBM                   73120
Tencent               69364
Panasonic             62691
Name: Top Technology Companies by Revenue, dtype: int64

Check if the company revenue was greater than 50000 for each company

In [48]:
s.apply(lambda x: True if x > 50000 else False)

Apple                True
Samsung              True
Alphabet             True
Foxconn              True
Microsoft            True
Huawei               True
Dell Technologies    True
Meta                 True
Sony                 True
Hitachi              True
Intel                True
IBM                  True
Tencent              True
Panasonic            True
Name: Top Technology Companies by Revenue, dtype: bool

Calculate the cumulative revenue for each company after sorting the companies by name

In [49]:
s.sort_index().values.cumsum()

array([ 182527,  457042,  549266,  731211,  813556,  942740, 1016360,
       1094227, 1180192, 1323207, 1386398, 1587132, 1672025, 1741889])

This is where the deviation started and I can update this instead of the entire notebook

## Bool arr for conditional selections

In [50]:
# Must be the same length as the series and must be boolean in nature. All the True values are returned as a series or entry

# if we do conditional operation on a series, this will result in a series arr the size of the series and can then be used to select the entries in the series that meet the condition


Get all the companies that have a revenue greater than 75000

In [None]:
# The initial condition creates a bool arr, we then pass this in to the selection method (loc)
s.loc[s > 50000]

Apple                274515
Samsung              200734
Alphabet             182527
Foxconn              181945
Microsoft            143015
Huawei               129184
Dell Technologies     92224
Meta                  85965
Sony                  84893
Hitachi               82345
Intel                 77867
IBM                   73620
Tencent               69864
Panasonic             63191
Name: Top Technology Companies by Revenue, dtype: int64

Combining Series methods and comparison operators for conditional selections

Which companies have revenue in the 90th percentile

In [53]:
s.loc[s.quantile(.90) < s]

Apple      274515
Samsung    200734
Name: Top Technology Companies by Revenue, dtype: int64

In [55]:
s.describe()

count        14.000000
mean     124420.642857
std       63686.481231
min       63191.000000
25%       78986.500000
50%       89094.500000
75%      172212.500000
max      274515.000000
Name: Top Technology Companies by Revenue, dtype: float64

In [None]:
# Bool operators are different for pandas than py.
# This is because the and, or, not used in vanila python are scalar operators and we need vector operators. So |, &, and ~ were used instead to allow for vector operations for conditional selection


Which companies revenue was not within 1.5 standard deviations of the mean?

In [66]:
# get the mean and standard deviations
lower_std=s.mean() - (1.5 * s.std())
upper_std=s.mean() + (1.5 * s.std())

In [73]:
s.loc[~((lower_std <= s) & (s <= upper_std))]

Apple    274515
Name: Top Technology Companies by Revenue, dtype: int64

Insert Somewhere:

Calculate the sum of the foreign and domestic companies with the highest revenue

Change the name of the domestic series to a name of your choosing

In [1]:
# Put this question in the titanic bit:
# 10. Select all women whose ages are even, and are older than 30 y/o

## Learning Vectorized Operations

In [4]:
# used commonly in dataframes and is a concept that originates from NumPy, the library that Pandas is built on
# Vectorized operations are when a function is applied to an entire series/column in a df

# Vectorized operations can simply be the common arithmetic operators or more complex functions created by the user 

In [None]:
# USE THIS IN A QUESTION THAT ASKS THE USER TO CALCULATE THE RECESION IMPACT USING OPERATIONS BETWEEN SERIES
# Note the series must be the same length for this to work

recession_impact = pd.Series([
    0.91, 0.93, 0.98, 0.97, 0.99, 0.89, 0.87,
    0.82, 0.93, 0.93, 0.89, 0.97, 0.97, 0.94], index=companies)

NameError: name 'pd' is not defined

In [6]:
# Create a few problems where we use the .isin([,,,]) example for conditional selection

In [None]:
# Can I use the s.drop_duplicate()
# Find the people that have null ages in the titanic dataset
# Also find a place to squeeze in the notnull()
# Convert all of the company names to lowercase
# convert all of the company titles to uppercase
# In the companies use the lower() function. Remember that you have to identify the obj as a str to do this first
# do more research on accessors
# Get the first word first_words = df.title.str.split().str[0]
# Calculate the number of characters in each company name (remember, this uses the str accessor)
# How do I count the number of individual characters in a str?


# 	df.rename(columns={'Old': 'new'}, inplace=True) renaming a column of two


# df.isnull().sum().sum() gets the sum for the series then the dataset

# Create a column that