# Source:

[Pandas documentation](https://pandas.pydata.org/pandas-docs/stable/)

[Dataset](https://www.kaggle.com/szymonjanowski/internet-articles-data-with-users-engagement)

[Corey Schafer tutorial](https://www.youtube.com/watch?v=ZyhVh-qRZPA&list=PL-osiE80TeTsWmV9i9c58mdDCSskIFdDS)

[Corey Schafer channel](https://www.youtube.com/channel/UCCezIgC97PvUuR4_gbFUs5g)

---

In [1]:
import pandas as pd

## Loading data
[Pandas read options](https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html)

In [2]:
df = pd.read_csv('data.csv', index_col='Unnamed: 0')

df.head(2)
#df.tail(1)

Unnamed: 0,source_id,source_name,author,title,description,url,url_to_image,published_at,content,top_article,engagement_reaction_count,engagement_comment_count,engagement_share_count,engagement_comment_plugin_count
0,reuters,Reuters,Reuters Editorial,NTSB says Autopilot engaged in 2018 California...,The National Transportation Safety Board said ...,https://www.reuters.com/article/us-tesla-crash...,https://s4.reutersmedia.net/resources/r/?m=02&...,2019-09-03T16:22:20Z,WASHINGTON (Reuters) - The National Transporta...,0.0,0.0,0.0,2528.0,0.0
1,the-irish-times,The Irish Times,Eoin Burke-Kennedy,Unemployment falls to post-crash low of 5.2%,Latest monthly figures reflect continued growt...,https://www.irishtimes.com/business/economy/un...,https://www.irishtimes.com/image-creator/?id=1...,2019-09-03T10:32:28Z,The States jobless rate fell to 5.2 per cent l...,0.0,6.0,10.0,2.0,0.0


## Display parameters
[pandas.set_option parameters](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.set_option.html?highlight=set_option#pandas-set-option)

In [3]:
pd.set_option('display.max_columns',20)
pd.set_option('display.max_rows',20)

## Getting information

In [4]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 10437 entries, 0 to 10436
Data columns (total 14 columns):
 #   Column                           Non-Null Count  Dtype  
---  ------                           --------------  -----  
 0   source_id                        10437 non-null  object 
 1   source_name                      10437 non-null  object 
 2   author                           9417 non-null   object 
 3   title                            10435 non-null  object 
 4   description                      10413 non-null  object 
 5   url                              10436 non-null  object 
 6   url_to_image                     9781 non-null   object 
 7   published_at                     10436 non-null  object 
 8   content                          9145 non-null   object 
 9   top_article                      10435 non-null  float64
 10  engagement_reaction_count        10319 non-null  float64
 11  engagement_comment_count         10319 non-null  float64
 12  engagement_share_c

In [5]:
#df.columns

In [6]:
df["source_id"].value_counts()

reuters                    1252
bbc-news                   1242
the-irish-times            1232
abc-news                   1139
cnn                        1132
business-insider           1048
the-new-york-times          986
cbs-news                    952
newsweek                    539
al-jazeera-english          499
the-wall-street-journal     333
espn                         82
1                             1
Name: source_id, dtype: int64

In [7]:
df.shape

(10437, 14)

## Dataframe
[Dataframe parameters](https://pandas.pydata.org/pandas-docs/stable/reference/frame.html)

In [8]:
people = {
    "First" : ["Thiago","Corey","John"],
    "Last" : ["Zen","Schafer","Doe"],
    "Email" : ["Zeb@email.com","schafer@email.com","Doe@email.com"]
}

df_people = pd.DataFrame(people)

df_people[['First','Last']]

Unnamed: 0,First,Last
0,Thiago,Zen
1,Corey,Schafer
2,John,Doe


### Getting rows

In [9]:
#Purely integer-location based indexing for selection by position.

df_people.iloc[0:2, [0,1]]

Unnamed: 0,First,Last
0,Thiago,Zen
1,Corey,Schafer


In [10]:
#Access a group of rows and columns by label(s) or a boolean array.

df_people.loc[[1,2], 'First':'Email']

Unnamed: 0,First,Last,Email
1,Corey,Schafer,schafer@email.com
2,John,Doe,Doe@email.com


## Indexes

In [11]:
df_people.set_index('First', inplace = True)

In [12]:
df_people.loc[["Thiago"], ["Email"]]

Unnamed: 0_level_0,Email
First,Unnamed: 1_level_1
Thiago,Zeb@email.com


In [13]:
df_people.sort_index()

Unnamed: 0_level_0,Last,Email
First,Unnamed: 1_level_1,Unnamed: 2_level_1
Corey,Schafer,schafer@email.com
John,Doe,Doe@email.com
Thiago,Zen,Zeb@email.com


In [14]:
df_people.reset_index(inplace = True)
df_people

Unnamed: 0,First,Last,Email
0,Thiago,Zen,Zeb@email.com
1,Corey,Schafer,schafer@email.com
2,John,Doe,Doe@email.com
