# Pandas Dataframe Properties

### Dataframe properties

👉🏻 These are dataframe methods for `selecting rows and columns`. 

In [2]:
import pandas as pd

In [4]:
ufo = pd.read_csv("http://bit.ly/uforeports")

In [5]:
ufo.head(3)

Unnamed: 0,City,Colors Reported,Shape Reported,State,Time
0,Ithaca,,TRIANGLE,NY,6/1/1930 22:00
1,Willingboro,,OTHER,NJ,6/30/1930 20:00
2,Holyoke,,OVAL,CO,2/15/1931 14:00


## `loc` property

`Docstring`: Access a group of rows and columns by label(s) or a boolean array.

`Type`: property

`loc` selects the columns and rows `by label`.

label means for `rows`-->`index`, for `columns`--->`column name`

format: [] notation is used. [`rows` `,` `columns`]

### single row

In [6]:
ufo.loc[0, :] # : means all columns, this bring us pandas series. Sonuç: demekki satırlar da sütunlar gibi seridir. 

City                       Ithaca
Colors Reported               NaN
Shape Reported           TRIANGLE
State                          NY
Time               6/1/1930 22:00
Name: 0, dtype: object

### multiple rows

In [6]:
ufo.loc[[1,2,3], :]# for multiple rows you can pass list

Unnamed: 0,City,Colors Reported,Shape Reported,State,Time
1,Willingboro,,OTHER,NJ,6/30/1930 20:00
2,Holyoke,,OVAL,CO,2/15/1931 14:00
3,Abilene,,DISK,KS,6/1/1931 13:00


In [7]:
ufo.loc[0:2, :]# row define by 0 through 2, 

Unnamed: 0,City,Colors Reported,Shape Reported,State,Time
0,Ithaca,,TRIANGLE,NY,6/1/1930 22:00
1,Willingboro,,OTHER,NJ,6/30/1930 20:00
2,Holyoke,,OVAL,CO,2/15/1931 14:00


⚠️ `loc[0:2]` `brings all 0,1 and 2 nd rows`, `unlike` `range` which finishes one behind. 

In [None]:
👉🏻 loc is inclusive in both sides

### column selection

In [12]:
ufo.loc[:, "City"].head()

0                  Ithaca
1             Willingboro
2                 Holyoke
3                 Abilene
4    New York Worlds Fair
Name: City, dtype: object

In [13]:
ufo.loc[:, ["City", "State"]].head()

Unnamed: 0,City,State
0,Ithaca,NY
1,Willingboro,NJ
2,Holyoke,CO
3,Abilene,KS
4,New York Worlds Fair,NY


In [15]:
ufo.loc[:, "City": "State"].head()

Unnamed: 0,City,Colors Reported,Shape Reported,State
0,Ithaca,,TRIANGLE,NY
1,Willingboro,,OTHER,NJ
2,Holyoke,,OVAL,CO
3,Abilene,,DISK,KS
4,New York Worlds Fair,,LIGHT,NY


### rows&columns combination

In [18]:
ufo.loc[0:2, "City": "State"].head()

Unnamed: 0,City,Colors Reported,Shape Reported,State
0,Ithaca,,TRIANGLE,NY
1,Willingboro,,OTHER,NJ
2,Holyoke,,OVAL,CO


In [20]:
ufo.head(3).drop("Time", axis=1)

Unnamed: 0,City,Colors Reported,Shape Reported,State
0,Ithaca,,TRIANGLE,NY
1,Willingboro,,OTHER,NJ
2,Holyoke,,OVAL,CO


### Using loc with conditionals

In [24]:
ufo[ufo.City=="Oakland"].head(2)# this is without loc

Unnamed: 0,City,Colors Reported,Shape Reported,State,Time
1694,Oakland,,CIGAR,CA,7/21/1968 14:00
2144,Oakland,,DISK,CA,8/19/1971 0:00


In [27]:
ufo.loc[ufo.City=="Oakland", :].head(2)# the same result but more explicit

Unnamed: 0,City,Colors Reported,Shape Reported,State,Time
1694,Oakland,,CIGAR,CA,7/21/1968 14:00
2144,Oakland,,DISK,CA,8/19/1971 0:00


In [28]:
ufo.loc[ufo.City=="Oakland", "State"].head(2)# flexible because you can choose columns, this will not cause problems

1694    CA
2144    CA
Name: State, dtype: object

In [30]:
ufo[ufo.City=="Oakland"].State.head(2)# aynı sonuç ama bu chain indexing, causes problems under certain scenarios

1694    CA
2144    CA
Name: State, dtype: object

## `iloc` property

iloc is for filtering rows and selecting columns by integer position. 

i stands for integer. 

**iloc** Type:property, Purely integer-location based indexing for selection by position.

the same logic, what rows do i want, what columns do i want. 

In [32]:
ufo.iloc[:, [0,3]].head()

Unnamed: 0,City,State
0,Ithaca,NY
1,Willingboro,NJ
2,Holyoke,CO
3,Abilene,KS
4,New York Worlds Fair,NY


⚠️ for iloc it is exclusive of 2nd number. 

In [33]:
ufo.iloc[0:3, :]

Unnamed: 0,City,Colors Reported,Shape Reported,State,Time
0,Ithaca,,TRIANGLE,NY,6/1/1930 22:00
1,Willingboro,,OTHER,NJ,6/30/1930 20:00
2,Holyoke,,OVAL,CO,2/15/1931 14:00


### selecting columns

In [None]:
inner bracket: ["City", "State"]: list 
outer bracket: you wanna select columns

In [38]:
ufo[["City", "State"]].head()

Unnamed: 0,City,State
0,Ithaca,NY
1,Willingboro,NJ
2,Holyoke,CO
3,Abilene,KS
4,New York Worlds Fair,NY


In [40]:
ufo.loc[:, ["City", "State"]].head()# this code is more explicit

Unnamed: 0,City,State
0,Ithaca,NY
1,Willingboro,NJ
2,Holyoke,CO
3,Abilene,KS
4,New York Worlds Fair,NY


In [42]:
ufo[0:2]# it refers to rows but difficult to remember, logic is not clear, stay away from it. 

Unnamed: 0,City,Colors Reported,Shape Reported,State,Time
0,Ithaca,,TRIANGLE,NY,6/1/1930 22:00
1,Willingboro,,OTHER,NJ,6/30/1930 20:00


In [43]:
ufo.iloc[0:2, :]# much more better, you know the logic

Unnamed: 0,City,Colors Reported,Shape Reported,State,Time
0,Ithaca,,TRIANGLE,NY,6/1/1930 22:00
1,Willingboro,,OTHER,NJ,6/30/1930 20:00


## `ix` method

It allows us to `mix` `labels and integers`. 

In [46]:
drinks = pd.read_csv("http://bit.ly/drinksbycountry", index_col="country")
drinks.head()

Unnamed: 0_level_0,beer_servings,spirit_servings,wine_servings,total_litres_of_pure_alcohol,continent
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Afghanistan,0,0,0,0.0,Asia
Albania,89,132,54,4.9,Europe
Algeria,25,0,14,0.7,Africa
Andorra,245,138,312,12.4,Europe
Angola,217,57,45,5.9,Africa


In [47]:
drinks.ix["Albania", 0]# depreceted

AttributeError: 'DataFrame' object has no attribute 'ix'

In [3]:
mydict = [{'a': 1, 'b': 2, 'c': 3, 'd': 4},
            {'a': 100, 'b': 200, 'c': 300, 'd': 400},
            {'a': 1000, 'b': 2000, 'c': 3000, 'd': 4000 }]

In [4]:
mydict

[{'a': 1, 'b': 2, 'c': 3, 'd': 4},
 {'a': 100, 'b': 200, 'c': 300, 'd': 400},
 {'a': 1000, 'b': 2000, 'c': 3000, 'd': 4000}]

In [5]:
df = pd.DataFrame(mydict)

In [6]:
df

Unnamed: 0,a,b,c,d
0,1,2,3,4
1,100,200,300,400
2,1000,2000,3000,4000


**Indexing just the rows**

With a scalar integer.

In [9]:
df.iloc

<pandas.core.indexing._iLocIndexer at 0x2c7e1b6f228>

In [10]:
df.iloc[0]

a    1
b    2
c    3
d    4
Name: 0, dtype: int64

In [11]:
type(df.iloc[0])

pandas.core.series.Series

With a list of integers.

In [12]:
df.iloc[[0]]

Unnamed: 0,a,b,c,d
0,1,2,3,4


In [13]:
df.iloc[[0,1]]

Unnamed: 0,a,b,c,d
0,1,2,3,4
1,100,200,300,400


With a `slice` object.

In [15]:
df.iloc[2:3]

Unnamed: 0,a,b,c,d
2,1000,2000,3000,4000


With a boolean mask the same length as the index.

In [16]:
df.iloc[[True, False, True]]

Unnamed: 0,a,b,c,d
0,1,2,3,4
2,1000,2000,3000,4000


With a callable, useful in method chains. The `x` passed
to the ``lambda`` is the DataFrame being sliced. This selects
the rows whose index label even.

In [17]:
df.iloc[lambda x: x.index %2 ==0]

Unnamed: 0,a,b,c,d
0,1,2,3,4
2,1000,2000,3000,4000


**Indexing both axes**

You can mix the indexer types for the index and columns. Use ``:`` to
select the entire axis.

With scalar integers.

In [22]:
df

Unnamed: 0,a,b,c,d
0,1,2,3,4
1,100,200,300,400
2,1000,2000,3000,4000


In [23]:
df.iloc[0]

a    1
b    2
c    3
d    4
Name: 0, dtype: int64

In [25]:
df.iloc[1:]

Unnamed: 0,a,b,c,d
1,100,200,300,400
2,1000,2000,3000,4000


In [18]:
df.iloc[0,1]

2

In [26]:
df.iloc[2,3]

4000

In [33]:
df.iloc[0,2]

3

In [34]:
df.iloc[1,3]

400

With lists of integers.

In [19]:
df.iloc[[0, 2], [1, 3]]

Unnamed: 0,b,d
0,2,4
2,2000,4000


In [31]:
df

Unnamed: 0,a,b,c,d
0,1,2,3,4
1,100,200,300,400
2,1000,2000,3000,4000


In [39]:
df.iloc[[0,2], [1,3]]

Unnamed: 0,b,d
0,2,4
2,2000,4000


With `slice` objects.

In [20]:
df.iloc[1:3, 0:3]

Unnamed: 0,a,b,c
1,100,200,300
2,1000,2000,3000


In [None]:
• iloc: index position
Type:        property
String form: <property object at 0x000002131A4B91D8>
Docstring:  Purely integer-location based indexing for selection by position.
			
			df.iloc[2]
			Same result with loc
