# Introduction to Pandas DataFrames

## What is a DataFrame?
Simply put, DataFrames are Python's way of displaying data in tablular form.

By using Python's powerful library for Data Analysis - pandas with DataFrames it offers us as users an efficient way to work with large amounts of structured data.

### Is it similar to Excel?

Just as Excel use spreadsheets, Python uses pandas dataframes.

> Python is often preferred over Excel due to its scalability and speed.

### How is a DataFrame composed?
So what does a DataFrame look like? It is made up of rows and columns making it two dimensional. Let's take a look below:

We see that our Index is made up of the companies, however an index can also be made up of numbers.

> An Index is like an address, it can be used to locate both rows and columns.

More on that later.

So, let's jump into our Jupyter notebook and create the DataFrame from this example.

The dataset for today's lab contains information on the Top Tech Companies in the World as shown below:

Our DataFrame contains five columns: - Revenue - Employees - Sector - Founding Date - Country

The syntax to create a dataframe is:

<code>
import pandas as pd
pd.DataFrame(data, index)
</code>

- data: These are the values from our dataset.
- index: This is like the address of the data we are storing.

In [50]:
import pandas as pd

# Creating an empty DataFrame
df = pd.DataFrame()
print(df)

Empty DataFrame
Columns: []
Index: []


Now let's add our data:

In [51]:
# Lists of data
data = {'Revenue': [274515,200734,182527,181945,143015,129184,92224,85965,84893,
                    82345,77867,73620,69864,63191],
        'Employees': [147000,267937,135301,878429,163000,197000,158000,58604,
                      109700,350864,110600,364800,85858,243540],
        'Sector': ['Consumer Electronics','Consumer Electronics','Software Services',
                   'Chip Manufacturing','Software Services','Consumer Electronics',
                   'Consumer Electronics','Software Services','Consumer Electronics',
                   'Consumer Electronics','Chip Manufacturing','Software Services',
                   'Software Services','Consumer Electronics'],
        'Founding Date':['01-04-1976','13-01-1969','04-09-1998','20-02-1974',
                         '04-04-1975','15-09-1987','01-02-1984','04-02-2004',
                         '07-04-1946','01-01-1910','18-07-1968','16-06-1911',
                         '11-11-1998','07-03-1918'],
        'Country':['USA','South Korea','USA','Taiwan','USA','China','USA','USA',
                   'Japan','Japan','USA','USA','China','Japan']} 
index = ['Apple','Samsung','Alphabet','Foxconn','Microsoft','Huawei',
         'Dell Technologies','Meta','Sony','Hitachi','Intel','IBM',
         'Tencent','Panasonic']

In [52]:
# Creating a new DataFrame with the data above
df = pd.DataFrame(data, index)

# Display the DataFrame
df

Unnamed: 0,Revenue,Employees,Sector,Founding Date,Country
Apple,274515,147000,Consumer Electronics,01-04-1976,USA
Samsung,200734,267937,Consumer Electronics,13-01-1969,South Korea
Alphabet,182527,135301,Software Services,04-09-1998,USA
Foxconn,181945,878429,Chip Manufacturing,20-02-1974,Taiwan
Microsoft,143015,163000,Software Services,04-04-1975,USA
Huawei,129184,197000,Consumer Electronics,15-09-1987,China
Dell Technologies,92224,158000,Consumer Electronics,01-02-1984,USA
Meta,85965,58604,Software Services,04-02-2004,USA
Sony,84893,109700,Consumer Electronics,07-04-1946,Japan
Hitachi,82345,350864,Consumer Electronics,01-01-1910,Japan


## Basic Navigation & Browsing Techniques

### Basic Navigation & Browsing Techniques
So, how do we truely harness the power of pandas DataFrames?

Let's explore some functions:

#### a) head
The head() method displays the first few rows of your DataFrame, making it easier to get a sense of the overall structure and content. It will display the first five rows by default.

<code>df.head()</code>

In [53]:
df.head()

Unnamed: 0,Revenue,Employees,Sector,Founding Date,Country
Apple,274515,147000,Consumer Electronics,01-04-1976,USA
Samsung,200734,267937,Consumer Electronics,13-01-1969,South Korea
Alphabet,182527,135301,Software Services,04-09-1998,USA
Foxconn,181945,878429,Chip Manufacturing,20-02-1974,Taiwan
Microsoft,143015,163000,Software Services,04-04-1975,USA


We can expand our dataset further and specify the number of rows .head(n) within the brackets, as shown below:

<code>df.head(10)</code>

> This function is especially useful as you can quickly inspect your DataFrame using the head() method to ensure that all of the data is stored correctly and as expected.

In [54]:
df.head(10)

Unnamed: 0,Revenue,Employees,Sector,Founding Date,Country
Apple,274515,147000,Consumer Electronics,01-04-1976,USA
Samsung,200734,267937,Consumer Electronics,13-01-1969,South Korea
Alphabet,182527,135301,Software Services,04-09-1998,USA
Foxconn,181945,878429,Chip Manufacturing,20-02-1974,Taiwan
Microsoft,143015,163000,Software Services,04-04-1975,USA
Huawei,129184,197000,Consumer Electronics,15-09-1987,China
Dell Technologies,92224,158000,Consumer Electronics,01-02-1984,USA
Meta,85965,58604,Software Services,04-02-2004,USA
Sony,84893,109700,Consumer Electronics,07-04-1946,Japan
Hitachi,82345,350864,Consumer Electronics,01-01-1910,Japan


#### b) tail
The tail() method is similar to head() except it displays the last few rows of your DataFrame. .

<code>df.tail()</code>

In [55]:
df.tail()

Unnamed: 0,Revenue,Employees,Sector,Founding Date,Country
Hitachi,82345,350864,Consumer Electronics,01-01-1910,Japan
Intel,77867,110600,Chip Manufacturing,18-07-1968,USA
IBM,73620,364800,Software Services,16-06-1911,USA
Tencent,69864,85858,Software Services,11-11-1998,China
Panasonic,63191,243540,Consumer Electronics,07-03-1918,Japan


Also similar to the head() we can specify the number of rows we want to display with .tail(n).

<code>df.tail(10)</code>

> This is useful for quickly identifying any problems with your dataset, as any errors will most likely be found at the end rather than the beginning.

In [56]:
df.tail(10)

Unnamed: 0,Revenue,Employees,Sector,Founding Date,Country
Microsoft,143015,163000,Software Services,04-04-1975,USA
Huawei,129184,197000,Consumer Electronics,15-09-1987,China
Dell Technologies,92224,158000,Consumer Electronics,01-02-1984,USA
Meta,85965,58604,Software Services,04-02-2004,USA
Sony,84893,109700,Consumer Electronics,07-04-1946,Japan
Hitachi,82345,350864,Consumer Electronics,01-01-1910,Japan
Intel,77867,110600,Chip Manufacturing,18-07-1968,USA
IBM,73620,364800,Software Services,16-06-1911,USA
Tencent,69864,85858,Software Services,11-11-1998,China
Panasonic,63191,243540,Consumer Electronics,07-03-1918,Japan


#### c) info
The info() method returns a list of all the columns in your DataFrame, along with their names, data types, number of values, and memory usage.

<code>df.info()</code>

> This makes it easy to gain insight into how much space is being taken up by each column and can help identify potential problems such as missing values or incorrect data types.

In [57]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 14 entries, Apple to Panasonic
Data columns (total 5 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   Revenue        14 non-null     int64 
 1   Employees      14 non-null     int64 
 2   Sector         14 non-null     object
 3   Founding Date  14 non-null     object
 4   Country        14 non-null     object
dtypes: int64(2), object(3)
memory usage: 672.0+ bytes


#### d) shape
The shape method returns a tuple with the number of rows and columns (rows, columns) in our DataFrame.

<code>df.shape</code>

> This gives us a quick insight into the dimensionality of our DataFrame.

In [58]:
df.shape

(14, 5)

#### e) describe
The describe() method displays descriptive statistics for numerical columns in your DataFrame, including the mean, median, standard deviation, minimum and maximum values.

<code>df.describe()</code>

> This can be very useful for understanding the distribution of values across a specific dataset or column without having to manually calculate each statistic.

In [59]:
df.describe()

Unnamed: 0,Revenue,Employees
count,14.0,14.0
mean,124420.642857,233616.642857
std,63686.481231,207583.087389
min,63191.0,58604.0
25%,78986.5,116775.25
50%,89094.5,160500.0
75%,172212.5,261837.75
max,274515.0,878429.0


#### f) nunique
The nunique() method counts the number of distinct elements.

<code>df.nunique()</code>

> This can be very useful for understanding the number of categories we have in a column for example.

In [60]:
df.nunique()

Revenue          14
Employees        14
Sector            3
Founding Date    14
Country           5
dtype: int64

#### g) isnull
The isnull() method detected missing values by creating a DataFrame object with a boolean value of True for NULL values and otherwise False.

<code>df.isnull()</code>

In [61]:
df.isnull()

Unnamed: 0,Revenue,Employees,Sector,Founding Date,Country
Apple,False,False,False,False,False
Samsung,False,False,False,False,False
Alphabet,False,False,False,False,False
Foxconn,False,False,False,False,False
Microsoft,False,False,False,False,False
Huawei,False,False,False,False,False
Dell Technologies,False,False,False,False,False
Meta,False,False,False,False,False
Sony,False,False,False,False,False
Hitachi,False,False,False,False,False


We can take this one step further by applying the sum() function to get a total number of NULL values in our DataFrame.

<code>df.isnull().sum()</code>

In [62]:
df.isnull().sum()

Revenue          0
Employees        0
Sector           0
Founding Date    0
Country          0
dtype: int64

#### Knowledge check

##### 1. Output the first four rows of the DataFrame

In [63]:
head_first_four = df.head(4)
head_first_four

Unnamed: 0,Revenue,Employees,Sector,Founding Date,Country
Apple,274515,147000,Consumer Electronics,01-04-1976,USA
Samsung,200734,267937,Consumer Electronics,13-01-1969,South Korea
Alphabet,182527,135301,Software Services,04-09-1998,USA
Foxconn,181945,878429,Chip Manufacturing,20-02-1974,Taiwan


##### 2. Output the last six rows of the DataFrame

In [64]:
tail_last_six = df.tail(6)
tail_last_six

Unnamed: 0,Revenue,Employees,Sector,Founding Date,Country
Sony,84893,109700,Consumer Electronics,07-04-1946,Japan
Hitachi,82345,350864,Consumer Electronics,01-01-1910,Japan
Intel,77867,110600,Chip Manufacturing,18-07-1968,USA
IBM,73620,364800,Software Services,16-06-1911,USA
Tencent,69864,85858,Software Services,11-11-1998,China
Panasonic,63191,243540,Consumer Electronics,07-03-1918,Japan


### Column Selection

#### Column selection:
Before we begin this section it is important we can differenciate between a Series and a DataFrame.

* A Series is a single column in a DataFrame
* A DataFrame is an entire table of data.

Let's take a look at how we can choose these items:

#### a) Select One Column
The [] operator can be used to select a specific column within a DataFrame. The output is a Series.

<code>df['column_name']</code>

In [65]:
df['Country']

Apple                        USA
Samsung              South Korea
Alphabet                     USA
Foxconn                   Taiwan
Microsoft                    USA
Huawei                     China
Dell Technologies            USA
Meta                         USA
Sony                       Japan
Hitachi                    Japan
Intel                        USA
IBM                          USA
Tencent                    China
Panasonic                  Japan
Name: Country, dtype: object

#### b) Select One Column and Apply Methods
DataFrame's also allow the user to apply methods on columns, these functions include sum(), mean(), min(), max(), median() and more.

<code>df['column_name'].sum()</code>

##### With numeric datatypes:

In [66]:
# Find the total of employees in work by these companies
df['Employees'].sum()

3270633

In [67]:
# Find the lowest revenue 
df['Revenue'].min()

63191

In [68]:
# Find the highest revenue
df['Revenue'].max()

274515

In [69]:
# Find the average revenue
df['Revenue'].mean()

124420.64285714286

In [70]:
# Find the average revenue rounded to the nearest whole number 
round(df['Revenue'].mean())

124421

In [71]:
# Find the median revenue rounded to the nearest whole number   
round(df['Revenue'].median())

89094

##### With string datatypes:

In [72]:
# Find the first name of the Sector when ordered alphabetically 
df['Sector'].min()

'Chip Manufacturing'

In [73]:
# Find the last name of the Sector when ordered alphabetically 
df['Sector'].max()

'Software Services'

In [74]:
# Find the number of companies that have a sector entry
df['Sector'].count()

14

In [75]:
# Find the number of unique sector entries
df['Sector'].nunique()

3

#### c) Select Multiple Columns
To select multiple columns, use the [] operator with a list of column names as the argument. This creates another DataFrame.

<code>df[['column_name_1', 'column_name_2','column_name_3']]</code>

In order to select multiple columns in our dataframe we have to create a list.

In [76]:
# ERROR?!
df['Revenue', 'Employees','Country']

KeyError: ('Revenue', 'Employees', 'Country')

In [None]:
# Ensure you use double square brackets [[]]
df[['Revenue', 'Employees','Country']]

We can save this new DataFrame under a new variable name so that we can come back to it later.

<code>new_df = df[['column_name_1', 'column_name_2','column_name_3']]</code>

In [None]:
# We can save this dataframe to another dataframe
df_new = df[['Revenue', 'Employees','Country']]

In [None]:
# Now when we call our new dataframe 
df_new

Unnamed: 0,Revenue,Employees,Country
Apple,274515,147000,USA
Samsung,200734,267937,South Korea
Alphabet,182527,135301,USA
Foxconn,181945,878429,Taiwan
Microsoft,143015,163000,USA
Huawei,129184,197000,China
Dell Technologies,92224,158000,USA
Meta,85965,58604,USA
Sony,84893,109700,Japan
Hitachi,82345,350864,Japan


In [None]:
# While our original dataframe remains the same
df

Unnamed: 0,Revenue,Employees,Sector,Founding Date,Country
Apple,274515,147000,Consumer Electronics,01-04-1976,USA
Samsung,200734,267937,Consumer Electronics,13-01-1969,South Korea
Alphabet,182527,135301,Software Services,04-09-1998,USA
Foxconn,181945,878429,Chip Manufacturing,20-02-1974,Taiwan
Microsoft,143015,163000,Software Services,04-04-1975,USA
Huawei,129184,197000,Consumer Electronics,15-09-1987,China
Dell Technologies,92224,158000,Consumer Electronics,01-02-1984,USA
Meta,85965,58604,Software Services,04-02-2004,USA
Sony,84893,109700,Consumer Electronics,07-04-1946,Japan
Hitachi,82345,350864,Consumer Electronics,01-01-1910,Japan


#### d) Select Multiple Columns and Apply Methods
We can take a shortcut and apply methods on more than one column at the same time.

<code>df[['column_name_1', 'column_name_2','column_name_3']].mean()</code>

In [None]:
df[['Revenue', 'Employees']].mean()

Revenue      124420.642857
Employees    233616.642857
dtype: float64

#### Knowledge check

##### 3. Select the column Employees

Let's practice these skills, select the column Employees into the variable employees_s. You'll notice that the result of this selection is a Series

In [None]:
employees_s = df['Employees']
type(employees_s)

pandas.core.series.Series

##### 4. Output the median Employees to the nearest whole number

Now, take it one step further and find the median of each row for the column Employees. Store the result in the variable employees_median



In [None]:
employees_median = round(employees_s.median())

##### 5. Calculate the mean for columns Revenue and Employees

Lastly, let's calculate the mean for the columns Revenue and Employees. Store the result in the variable r_e_mean.

Your result should be a Series.

In [None]:
r_e_mean = df[['Revenue', 'Employees']].mean()
r_e_mean

Revenue      124420.642857
Employees    233616.642857
dtype: float64

### Selection by Index

#### Selection by Index - loc
Index selection .loc is a Python DataFrames method that allows users to select DataFrame rows and columns by their labels or integer positions.

It is most commonly used when a user needs to access specific elements within a DataFrame, such as selecting all rows with a specific label or values in a specific column.

<code>df.loc[row_label, column_label]</code>

We can use : in place of row_label or column_label to call all the data.

<code>df.loc[:, column_label]</code>

<code>df.loc[row_label,:]</code>

We can also pass multiple columns in place of column_label or multiple rows in place of row_label.

<code>df.loc[['row_name_1', 'row_name_2','row_name_3'], column_label]</code>

<code>df.loc[row_label,['column_name_1', 'column_name_2','column_name_3']]</code>

In [None]:
# Find the revenue for Samsung 

# loc[row_label, column_label]
df.loc['Samsung','Revenue']

200734

Notice if we use : in place of row_label, it will return all the data from the specified column.

Thus, we have a Series

In [81]:
df.loc[:, 'Revenue']

Apple                274515
Samsung              200734
Alphabet             182527
Foxconn              181945
Microsoft            143015
Huawei               129184
Dell Technologies     92224
Meta                  85965
Sony                  84893
Hitachi               82345
Intel                 77867
IBM                   73620
Tencent               69864
Panasonic             63191
Name: Revenue, dtype: int64

##### Selecting multiple Columns and Rows

In [None]:
# Multiple Rows
df.loc[['Apple', 'Sony', 'Samsung'], 'Revenue']

Apple      274515
Sony        84893
Samsung    200734
Name: Revenue, dtype: int64

In [None]:
# Multiple Columns
df.loc['Apple', ['Revenue', 'Employees', 'Founding Date']]

Revenue              274515
Employees            147000
Founding Date    01-04-1976
Name: Apple, dtype: object

In [None]:
# Custom rows and colums
rows = ['Apple', 'Sony', 'Samsung']
columns = ['Founding Date', 'Country', 'Employees']
df.loc[rows, columns]

Unnamed: 0,Founding Date,Country,Employees
Apple,01-04-1976,USA,147000
Sony,07-04-1946,Japan,109700
Samsung,13-01-1969,South Korea,267937


##### Slicing is a powerful feature of pandas that enables us to access specific parts of our DataFrame.

> start:stop:step

If we don't specify the step, the default value is 1.

###### a) With column's
<code>df.loc[`row_label`, `column_name_start`:`column_name_stop`]</code>

In [None]:
# All values of Apple
df.loc['Apple', :]

Revenue                        274515
Employees                      147000
Sector           Consumer Electronics
Founding Date              01-04-1976
Country                           USA
Name: Apple, dtype: object

In [None]:
# All values of Apple between the Employees and the Founding Date colums
df.loc['Apple', 'Employees' : 'Founding Date']

Employees                      147000
Sector           Consumer Electronics
Founding Date              01-04-1976
Name: Apple, dtype: object

###### b) With rows's
<code>df.loc[`row_name_start`:`row_name_stop` , `column_label`]</code>

In [None]:
# Apple to Sony Founding Dates
df.loc['Apple': 'Sony', 'Founding Date']

Apple                01-04-1976
Samsung              13-01-1969
Alphabet             04-09-1998
Foxconn              20-02-1974
Microsoft            04-04-1975
Huawei               15-09-1987
Dell Technologies    01-02-1984
Meta                 04-02-2004
Sony                 07-04-1946
Name: Founding Date, dtype: object

###### c) With step
<code>df.loc[`row_label`, `column_name_start`:`column_name_stop`:n]</code>

<code>df.loc[`row_name_start`:`row_name_stop`:n , `column_label`]</code>

In [None]:
# Only every second row between 'Apple' and 'Sony'
df.loc['Apple':'Sony':2, columns]

Unnamed: 0,Founding Date,Country,Employees
Apple,01-04-1976,USA,147000
Alphabet,04-09-1998,USA,135301
Microsoft,04-04-1975,USA,163000
Dell Technologies,01-02-1984,USA,158000
Sony,07-04-1946,Japan,109700


###### d) With step and :
<code>df.loc[:`, `column_name_start`:`column_name_stop`:n]</code>

<code>df.loc[`row_name_start`:`row_name_stop`:n , :]</code>

In [None]:
# Only every second row between 'Apple' and 'Sony' but all columns
df.loc['Apple' : 'Sony' : 2, :]

Unnamed: 0,Revenue,Employees,Sector,Founding Date,Country
Apple,274515,147000,Consumer Electronics,01-04-1976,USA
Alphabet,182527,135301,Software Services,04-09-1998,USA
Microsoft,143015,163000,Software Services,04-04-1975,USA
Dell Technologies,92224,158000,Consumer Electronics,01-02-1984,USA
Sony,84893,109700,Consumer Electronics,07-04-1946,Japan


#### Knowledge check

##### 6. Select the Revenue, Employees & Sector for the companies Apple, Alphabet and Microsoft

Now let's leverage your .loc selection skills. Your task is to select the columns Revenue, Employees & Sector for the companies Apple, Alphabet and Microsoft. Your result should be stored in a variable index_selection and it should be a DataFrame

In [None]:
cols = ['Revenue', 'Employees', 'Sector']
companies = ['Apple', 'Alphabet', 'Microsoft']

index_selection = df.loc[companies, cols]

### Selection by Position

#### Selection by Position - iloc
Selection by position .iloc is a useful Python DataFrames method that allows users to select rows and columns of a DataFrame based on their integer positions.

> This is especially useful when users need to access elements within a DataFrame that do not have labels or specific column names.

<code>df.iloc[row_position, column_position]</code>
We can use : in place of row_position or column_position to call all the data.

<code>df.iloc[:, column_position]</code>

<code>df.iloc[row_position,:]</code>
We can also pass multiple columns in place of column_position or multiple rows in place of row_position.

<code>df.iloc[['row_position_1', 'row_position_2','row_position_3'], column_position]</code>

<code>df.iloc[row_position,['column_position_1', 'column_position_2','column_position_3']]</code>

In [77]:
# Find the revenue for Samsung 
df.iloc[1, 0]

200734

Notice if we use : in place of row_position, it will again return all the data from the specified column.

Thus, we have a Series

In [78]:
# All Revenue values as a Series
df.iloc[:,0]

Apple                274515
Samsung              200734
Alphabet             182527
Foxconn              181945
Microsoft            143015
Huawei               129184
Dell Technologies     92224
Meta                  85965
Sony                  84893
Hitachi               82345
Intel                 77867
IBM                   73620
Tencent               69864
Panasonic             63191
Name: Revenue, dtype: int64

Let's now use : in place of row_position or column_position

In [79]:
# column_position
df.iloc[1,:]

Revenue                        200734
Employees                      267937
Sector           Consumer Electronics
Founding Date              13-01-1969
Country                   South Korea
Name: Samsung, dtype: object

In [80]:
# row_position
df.iloc[:,0]

Apple                274515
Samsung              200734
Alphabet             182527
Foxconn              181945
Microsoft            143015
Huawei               129184
Dell Technologies     92224
Meta                  85965
Sony                  84893
Hitachi               82345
Intel                 77867
IBM                   73620
Tencent               69864
Panasonic             63191
Name: Revenue, dtype: int64

##### Selecting multiple Columns and Rows

In [83]:
# Multiple rows
df.iloc[[0,1,8], 0]

Apple      274515
Samsung    200734
Sony        84893
Name: Revenue, dtype: int64

In [84]:
# Multiple columns
df.iloc[0, [1,4]]

Employees    147000
Country         USA
Name: Apple, dtype: object

In [85]:
# Custom rows and colums
rows_i = [0,1,8]
columns_i = [1,2,4]
df.iloc[rows_i,columns_i]

Unnamed: 0,Employees,Sector,Country
Apple,147000,Consumer Electronics,USA
Samsung,267937,Consumer Electronics,South Korea
Sony,109700,Consumer Electronics,Japan


##### Slicing is a powerful feature of pandas that enables us to access specific parts of our DataFrame.

> start:stop:step

###### a) With column's
<code>df.iloc[`row_position`, `column_position_start`:`column_position_stop`]</code>

In [87]:
df.iloc[0, 1:4]

Employees                      147000
Sector           Consumer Electronics
Founding Date              01-04-1976
Name: Apple, dtype: object

###### b) With rows's
<code>df.iloc[`row_position_start`:`row_position_stop` , `column_position`]</code>


In [88]:
df.iloc[0:8, 1]

Apple                147000
Samsung              267937
Alphabet             135301
Foxconn              878429
Microsoft            163000
Huawei               197000
Dell Technologies    158000
Meta                  58604
Name: Employees, dtype: int64

###### c) With step
<code>df.iloc[`row_position`, `column_position_start`:`column_position_stop`:n]</code>

<code>df.iloc[`row_position_start`:`row_position_stop`:n , `column_position`]</code>


In [89]:
# every second row between Apple and Sony
df.iloc[0:9:2, columns_i]

Unnamed: 0,Employees,Sector,Country
Apple,147000,Consumer Electronics,USA
Alphabet,135301,Software Services,USA
Microsoft,163000,Software Services,USA
Dell Technologies,158000,Consumer Electronics,USA
Sony,109700,Consumer Electronics,Japan


###### d) With step and :
<code>df.iloc[:`, `column_position_start`:`column_position_stop`:n]</code>

<code>df.iloc[`row_position_start`:`row_position_stop`:n , :]</code>

In [90]:
df.iloc[0:9:2, :]

Unnamed: 0,Revenue,Employees,Sector,Founding Date,Country
Apple,274515,147000,Consumer Electronics,01-04-1976,USA
Alphabet,182527,135301,Software Services,04-09-1998,USA
Microsoft,143015,163000,Software Services,04-04-1975,USA
Dell Technologies,92224,158000,Consumer Electronics,01-02-1984,USA
Sony,84893,109700,Consumer Electronics,07-04-1946,Japan


#### Knowledge Check

##### 7. Perform a selection using .iloc and positional selection

Now it's time to put your iloc skills to the practice. Your task is to select the companies in positions: 2nd, 4th and 6th. And the columns in positions 1st, 2nd and the last one. Store your result in the variable position_selection.

In [91]:
companies = [1, 3, 5]
columns = [0, 1, -1]
position_selection = df.iloc[companies, columns]
position_selection

Unnamed: 0,Revenue,Employees,Country
Samsung,200734,267937,South Korea
Foxconn,181945,878429,Taiwan
Huawei,129184,197000,China
