In [2]:
import pandas as pd

### Choosing data in a Series
Let's create a simple series and store it in a variable named “series_1”.

In [3]:
series_1 = pd.Series([2, 3, 4, 5, 6])
series_1

0    2
1    3
2    4
3    5
4    6
dtype: int64

In [4]:
series_1[3]

np.int64(5)

In [5]:
sale = pd.Series([500,85000,40000,9600,100000],index = ['Cookie', 'Choco', 'Toothpaste', 'Soap', 'Toffee'])
sale

Cookie           500
Choco          85000
Toothpaste     40000
Soap            9600
Toffee        100000
dtype: int64

In [6]:
sale['Toffee']

np.int64(100000)

### Choosing data in Data frame
As discussed before, a data frame is a two-dimensional data structure with rows and columns. The Rows and columns in a data frame are indexed separately and can be used to select the data we need for our analysis.

Let's create a simple data frame with three columns state, year and pop (population).

In [7]:
data = {'state': ['Ohio', 'Ohio', 'Ohio', 'Nevada', 'Nevada', 'Nevada'],'year': [2000, 2001, 2002, 2001, 2002, 2003],'pop': [1.5, 1.7, 3.6, 2.4, 2.9, 3.2]}
data = pd.DataFrame(data)
data

Unnamed: 0,state,year,pop
0,Ohio,2000,1.5
1,Ohio,2001,1.7
2,Ohio,2002,3.6
3,Nevada,2001,2.4
4,Nevada,2002,2.9
5,Nevada,2003,3.2


### Selecting columns in a data frame

In our example, we will select the column 'state' from the data frame, and the result will show us all the data stored in that column. There are two ways to select the column.

In [8]:
data.state

0      Ohio
1      Ohio
2      Ohio
3    Nevada
4    Nevada
5    Nevada
Name: state, dtype: object

In [9]:
data['state']

0      Ohio
1      Ohio
2      Ohio
3    Nevada
4    Nevada
5    Nevada
Name: state, dtype: object

In [10]:
data.pop

<bound method DataFrame.pop of     state  year  pop
0    Ohio  2000  1.5
1    Ohio  2001  1.7
2    Ohio  2002  3.6
3  Nevada  2001  2.4
4  Nevada  2002  2.9
5  Nevada  2003  3.2>

In [11]:
data['pop']

0    1.5
1    1.7
2    3.6
3    2.4
4    2.9
5    3.2
Name: pop, dtype: float64

In [12]:
data[['pop', 'state']]

Unnamed: 0,pop,state
0,1.5,Ohio
1,1.7,Ohio
2,3.6,Ohio
3,2.4,Nevada
4,2.9,Nevada
5,3.2,Nevada


In [14]:
df = pd.read_csv('../survey_results_public.csv', usecols=['Age','ConvertedComp', 'Country', 'LanguageWorkedWith', 'YearsCode'])
df

Unnamed: 0,Age,ConvertedComp,Country,LanguageWorkedWith,YearsCode
0,,,Germany,C#;HTML/CSS;JavaScript,36
1,,,United Kingdom,JavaScript;Swift,7
2,,,Russian Federation,Objective-C;Python;Swift,4
3,25.0,,Albania,,7
4,31.0,,United States,HTML/CSS;Ruby;SQL,15
...,...,...,...,...,...
64456,,,United States,,10
64457,,,Morocco,Assembly;Bash/Shell/PowerShell;C;C#;C++;Dart;G...,
64458,,,Viet Nam,,
64459,,,Poland,HTML/CSS,


### iloc

Let's get the value stored at row number 1 and column number 2 using iloc.

In [18]:
dato = df.iloc[1,2]
dato

'United Kingdom'

In [19]:
dato = df.iloc[2:9,2]
dato

2    Russian Federation
3               Albania
4         United States
5               Germany
6                 India
7         United States
8               Tunisia
Name: Country, dtype: object

In [20]:
df_subset = df.iloc[1:50000]
df_subset

Unnamed: 0,Age,ConvertedComp,Country,LanguageWorkedWith,YearsCode
1,,,United Kingdom,JavaScript;Swift,7
2,,,Russian Federation,Objective-C;Python;Swift,4
3,25.0,,Albania,,7
4,31.0,,United States,HTML/CSS;Ruby;SQL,15
5,,,Germany,HTML/CSS;Java;JavaScript,6
...,...,...,...,...,...
49995,29.0,27864.0,Croatia,HTML/CSS;JavaScript;PHP;SQL;TypeScript,8
49996,,112456.0,United Kingdom,Bash/Shell/PowerShell;Python,40
49997,56.0,131844.0,United Kingdom,Bash/Shell/PowerShell;C#;HTML/CSS;Java;JavaScr...,41
49998,42.0,82000.0,United States,Java;JavaScript;SQL,26


### loc

As discussed loc function can use either index position or label for rows. So let's select data in column LanguageWorkedWith from 500 to 550.

But wait a minute, why did we not write row 551 to get row 550 in the loc function? Here it is important to remember the loc function works on labels, even integer-based locations treated as labels. That's why the output we will get from 550 to 550, unlike the iloc function from 550 to 549.

In [21]:
subset = df.loc[500:550, 'LanguageWorkedWith']
subset

500                                          C#;HTML/CSS
501            Bash/Shell/PowerShell;C#;Dart;Java;Kotlin
502                                                  SQL
503       C++;HTML/CSS;Java;JavaScript;Python;TypeScript
504               C#;HTML/CSS;Java;JavaScript;Python;SQL
505                C;C++;HTML/CSS;Java;JavaScript;Python
506                                               Python
507                                  HTML/CSS;JavaScript
508                   Bash/Shell/PowerShell;C;C++;Python
509                             HTML/CSS;JavaScript;Ruby
510                         Bash/Shell/PowerShell;Python
511                C#;HTML/CSS;JavaScript;SQL;TypeScript
512         JavaScript;Kotlin;Python;Ruby;SQL;TypeScript
513    Bash/Shell/PowerShell;C#;HTML/CSS;JavaScript;K...
514       Bash/Shell/PowerShell;C#;JavaScript;Python;SQL
515    Bash/Shell/PowerShell;C#;Go;Java;Kotlin;Python...
516                                               Python
517                Bash/Shell/P

In [22]:
subset = df.loc[[1,2,3,4,5], ['Age', 'Country', 'ConvertedComp']]
subset

Unnamed: 0,Age,Country,ConvertedComp
1,,United Kingdom,
2,,Russian Federation,
3,25.0,Albania,
4,31.0,United States,
5,,Germany,


### Bonus Section
Bonus Section So far, we have worked with rows with their index position numbers. We may want to use a column as the index in some scenarios. Let’s convert the Country column to index. Remember that the country column data will now be in the index, and there will be no more Country column in the data frame.

In [23]:
df.set_index('Country', inplace=True)
df

Unnamed: 0_level_0,Age,ConvertedComp,LanguageWorkedWith,YearsCode
Country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Germany,,,C#;HTML/CSS;JavaScript,36
United Kingdom,,,JavaScript;Swift,7
Russian Federation,,,Objective-C;Python;Swift,4
Albania,25.0,,,7
United States,31.0,,HTML/CSS;Ruby;SQL,15
...,...,...,...,...
United States,,,,10
Morocco,,,Assembly;Bash/Shell/PowerShell;C;C#;C++;Dart;G...,
Viet Nam,,,,
Poland,,,HTML/CSS,


In [24]:
df.loc[['Morocco', 'United States'], 'Age': 'LanguageWorkedWith']

Unnamed: 0_level_0,Age,ConvertedComp,LanguageWorkedWith
Country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Morocco,26.0,8256.0,Bash/Shell/PowerShell;HTML/CSS;JavaScript;PHP;SQL
Morocco,28.0,21312.0,C#;HTML/CSS;Java;JavaScript;PHP;R;SQL;TypeScript
Morocco,,,HTML/CSS;JavaScript;Python;SQL
Morocco,21.0,,HTML/CSS;Java;JavaScript;Kotlin;SQL
Morocco,20.0,,C;C#;C++;HTML/CSS;Java;JavaScript;PHP;SQL
...,...,...,...
United States,32.0,,Assembly;Bash/Shell/PowerShell;C;C#;C++;Dart;G...
United States,29.0,,Haskell
United States,,,Bash/Shell/PowerShell;HTML/CSS;PHP
United States,,,C++;HTML/CSS;Java;JavaScript;Python;SQL
