# Methods, Functions, and Attributes 


> ### Everything is an Object

## 1. Functions

Some of the most used pandas functions:

| Command                     | Description                                   |
|-----------------------------|-----------------------------------------------|
| df = pd.DataFrame(...)     | DataFrame creation                           |
| x = pd.Series(...)         | Series creation                               |
| pd.read_csv(file)          | read a table from disk                       |
| pd.unique(values)          | returns unique values in 1d array-like object|
| pd.concat([df1, df2])      | used to combine Series or DataFrame objects |


#### Example: `pd.read_csv()` 

- `pd.read_csv()` has one required parameter `filepath_or_buffer` which is the path to the file to be read
- the function itself though is stand alone, it has no object it depends on
- this function takes a filepath as an input argument and returns the data from that file in a tabular form
- this function 'lives' in the pandas library

In [15]:
import pandas as pd
import pandas

In [3]:
# let's pass a relative filepath to the large_countries_2015.csv 

pd.read_csv('./lecture_data/large_countries_2015.csv') 

Unnamed: 0,country,population,fertility,continent
0,Bangladesh,160995600.0,2.12,Asia
1,Brazil,207847500.0,1.78,South America
2,China,1376049000.0,1.57,Asia
3,India,1311051000.0,2.43,Asia
4,Indonesia,257563800.0,2.28,Asia
5,Japan,126573500.0,1.45,Asia
6,Mexico,127017200.0,2.13,North America
7,Nigeria,182202000.0,5.89,Africa
8,Pakistan,188924900.0,3.04,Asia
9,Philippines,100699400.0,2.98,Asia


In [4]:
# Save the dataframe in a variable

df = pd.read_csv('./lecture_data/large_countries_2015.csv') 

In [5]:
df

Unnamed: 0,country,population,fertility,continent
0,Bangladesh,160995600.0,2.12,Asia
1,Brazil,207847500.0,1.78,South America
2,China,1376049000.0,1.57,Asia
3,India,1311051000.0,2.43,Asia
4,Indonesia,257563800.0,2.28,Asia
5,Japan,126573500.0,1.45,Asia
6,Mexico,127017200.0,2.13,North America
7,Nigeria,182202000.0,5.89,Africa
8,Pakistan,188924900.0,3.04,Asia
9,Philippines,100699400.0,2.98,Asia


### Other function examples

In [8]:
## check the length of the dataframe

len(df)


12

In [9]:
# what if we run the function without the parantheses?

len

<function len(obj, /)>

In [10]:
# save it in also a variable

length_df = len(df)


In [None]:
length_df

Python built-in functions: https://docs.python.org/3/library/functions.html

### Functions related to packages and libraries

In [11]:
# pd.DataFrame() from a dictionary

pd.DataFrame(
    {'a':[1,2,3],
     'b':[4,5,6]
    }
)

Unnamed: 0,a,b
0,1,4
1,2,5
2,3,6


In [12]:
# pd.Series() from a list

pd.Series([1,2,3,4,5,6])


0    1
1    2
2    3
3    4
4    5
5    6
dtype: int64

**Comments:**
- python has built in functions as we have seen in the functions encounter (len, sum, input, ...)
- most python libraries have functions 
- custom functions can be written by the coders to make their job easier

## 2. Methods
- they are always associated with an object
- they have the access to the object they are run on

The difference between a method and a function is that a function is given data to perform a transformation upon and a method performs the transformation on a defined object it is associated with. In this case that object would be a pandas DataFrame.

| Command                | Description                                                  |
|------------------------|--------------------------------------------------------------|
| df.to_csv(file)        | write a table to disk                                        |
| df.sum()               | returns the sum of the values over the requested axis        |
| df.sort_values()       | sorts by the values along either axis                        |
| df.count()             | returns count non-NA cells for each column or row            |
| df.nunique()           | returns the number of unique values in a Series or DataFrame|
| df['col'].str.len()    | returns the length of each string in pandas Series           |


### Notes:

When applying python string methods to pandas Series the method must be preceded by .str accessor in order for the method to be called correctly
Keep in mind that many but not all pandasmethods can be applied to pandas DataFrames and Series

In our example `df` is the object in question. `df` is a dataframe object. It is a small dataframe and like every object in python is constrained by the architecture of it's datatype

In [13]:
# what is the datatype of df?

type(df)


pandas.core.frame.DataFrame

If the variable name of an object is typed out in jupyter notebook and a period afterwards the user can push the tab key to get a dropdown like menu of all the available methods and attributes. Try it below

In [None]:
df.

In [14]:
pandas.

SyntaxError: invalid syntax (222040558.py, line 1)

### Inspect the content of the dataframe

In [16]:
# display the first few rows

df.head()


Unnamed: 0,country,population,fertility,continent
0,Bangladesh,160995600.0,2.12,Asia
1,Brazil,207847500.0,1.78,South America
2,China,1376049000.0,1.57,Asia
3,India,1311051000.0,2.43,Asia
4,Indonesia,257563800.0,2.28,Asia


In [17]:
# display a few random rows

df.sample(4)

Unnamed: 0,country,population,fertility,continent
4,Indonesia,257563815.0,2.28,Asia
5,Japan,126573481.0,1.45,Asia
6,Mexico,127017224.0,2.13,North America
10,Russia,143456918.0,1.61,Europe


Now execute one of the methods on the object such as `.sum()`

In [18]:
# sum the values from the dataframe

df.sum()

country       BangladeshBrazilChinaIndiaIndonesiaJapanMexico...
population                                         4504153940.0
fertility                                                 29.25
continent     AsiaSouth AmericaAsiaAsiaAsiaAsiaNorth America...
dtype: object

In [19]:
'Name'+'Surname'

'NameSurname'

The `.sum()` method adds up the total of each column. In the cases of the strings it concatenates them together. What is important to see here is that a method performs an opertaion on the object it is associated with.

In [22]:
# what is the overall population?

df['population'].sum()

4504153940.0

### String methods 



In [29]:
# get the length of each of the country names

df['continent'].str.len()

0      4
1     13
2      4
3      4
4      4
5      4
6     13
7      6
8      4
9      4
10     6
11    13
Name: continent, dtype: int64

In [30]:
# assign this information as a new column

df['continent_len']  =  df['continent'].str.len()

In [31]:
df

Unnamed: 0,country,population,fertility,continent,continent_len
0,Bangladesh,160995600.0,2.12,Asia,4
1,Brazil,207847500.0,1.78,South America,13
2,China,1376049000.0,1.57,Asia,4
3,India,1311051000.0,2.43,Asia,4
4,Indonesia,257563800.0,2.28,Asia,4
5,Japan,126573500.0,1.45,Asia,4
6,Mexico,127017200.0,2.13,North America,13
7,Nigeria,182202000.0,5.89,Africa,6
8,Pakistan,188924900.0,3.04,Asia,4
9,Philippines,100699400.0,2.98,Asia,4


### Saving your data to a file

In [36]:
# save your data to a file

df.to_csv('new_table.csv', header=False, index=False)

## 3. Attributes



Attributes are values that describe a defined python object. In the case of a pandas DataFrame object one attibute would be the shape of the dataframe, how many rows and columns it has. An attribute is called in the same way as a method but has no () after the call.

| Command           | Description                                                  |
|-------------------|--------------------------------------------------------------|
| df.shape          | returns tuple representing the dimensionality of the DataFrame |
| df.index          | returns the index as an array-like object                    |
| df.columns        | returns the column index as an array-like object             |
| df.dtypes         | returns the data types in the DataFrame                      |
| df.values         | returns the values of the DataFrame as an array-like object  |
| df.ndim           | returns an integer representing the number of axes            |


Functions, methods and attributes all depend on how the developers of a language or library program and design it. In the case of pandas the developers use Object Oriented Programming and instilled all three options for the users.

In [38]:
# let's describe our dataframe: what is the shape?

df.shape

(12, 5)

In [39]:
# what is the size? and how to interpret it?

df.size

60

In [45]:
# what are the column names?

df.columns

Index(['country', 'population', 'fertility', 'continent', 'continent_len'], dtype='object')

In [46]:
df.info() 
# this is a method though, it performs some functionality 
# on the object DataFrame and returns as summarized information

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12 entries, 0 to 11
Data columns (total 5 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   country        12 non-null     object 
 1   population     12 non-null     float64
 2   fertility      12 non-null     float64
 3   continent      12 non-null     object 
 4   continent_len  12 non-null     int64  
dtypes: float64(2), int64(1), object(2)
memory usage: 608.0+ bytes


## Question - what is a function / method / attribute?
- `pd.DataFrame(data)`
- `df.sum()`
- `df.shape`

- what is `df.info()` ?