## DataFrames
- A DataFrame is a two dimensional labeled data structure with columns of potentially different types. You can think of it like a spreadsheet or a SQL table or a dictionary of series objects. It is generally the most commonly used Pandas object.
- A dataframe can be seen as a composition of two or more series.
- axis=0 refers to the index.
- axis=1 refers to the column.

In [1]:
import pandas as pd

In [7]:
data = {'a':[100, 200, 300], 'b':[400, 500, 600], 'c':[700, 800, 900]}
df = pd.DataFrame(data)
print(df)

     a    b    c
0  100  400  700
1  200  500  800
2  300  600  900


In [3]:
print(df['a'])           # Returns to us as a series

0    100
1    200
2    300
Name: a, dtype: int64


In [4]:
df = pd.DataFrame(data=[[100, 200, 300], [400, 500, 600], [700, 800, 900]], index=['1st', '2nd', '3rd'], columns=['a', 'b', 'c'])
print(df)
print("\nValues of a: \n", df.a)

       a    b    c
1st  100  200  300
2nd  400  500  600
3rd  700  800  900

Values of a: 
 1st    100
2nd    400
3rd    700
Name: a, dtype: int64


### DataFrame attributes

In [24]:
richest = pd.read_csv('TopRichestInWorld.csv')
richest

Unnamed: 0,Name,NetWorth,Age,Country/Territory,Source,Industry
0,Elon Musk,"$219,000,000,000",50,United States,"Tesla, SpaceX",Automotive
1,Jeff Bezos,"$171,000,000,000",58,United States,Amazon,Technology
2,Bernard Arnault & family,"$158,000,000,000",73,France,LVMH,Fashion & Retail
3,Bill Gates,"$129,000,000,000",66,United States,Microsoft,Technology
4,Warren Buffett,"$118,000,000,000",91,United States,Berkshire Hathaway,Finance & Investments
...,...,...,...,...,...,...
96,Vladimir Potanin,"$17,300,000,000",61,Russia,metals,Metals & Mining
97,Harold Hamm & family,"$17,200,000,000",76,United States,oil & gas,Energy
98,Sun Piaoyang,"$17,100,000,000",63,China,pharmaceuticals,Healthcare
99,Luo Liguo & family,"$17,000,000,000",66,China,chemicals,Manufacturing


In [25]:
richest.shape

(101, 6)

In [26]:
richest.size

606

In [27]:
richest.columns

Index(['Name', 'NetWorth', 'Age', 'Country/Territory', 'Source', 'Industry'], dtype='object')

In [28]:
richest.index

RangeIndex(start=0, stop=101, step=1)

In [29]:
richest.axes

[RangeIndex(start=0, stop=101, step=1),
 Index(['Name', 'NetWorth', 'Age', 'Country/Territory', 'Source', 'Industry'], dtype='object')]

In [30]:
richest.dtypes

Name                 object
NetWorth             object
Age                   int64
Country/Territory    object
Source               object
Industry             object
dtype: object

### DataFrame methods
- a lot of the methods are the same as the ones that we have for series, although they might act differently for each one.

In [34]:
series = pd.Series([100, 200, 300])
series

0    100
1    200
2    300
dtype: int64

In [35]:
data = {'a': [100, 200, 300], 'b': [400, 500, 600], 'c': [700, 800, 900]}
df = pd.DataFrame(data)
df

Unnamed: 0,a,b,c
0,100,400,700
1,200,500,800
2,300,600,900


In [38]:
a = series.sum()
print(a)

b = df.sum()
print(b)

600
a     600
b    1500
c    2400
dtype: int64


In [None]:
# series.sum(axis=1)  # Will create an exception as it doesn't make any sence / doesn't have any column

In [None]:
df.sum(axis=0)      # By default it is axis=0 if not given specifically

a     600
b    1500
c    2400
dtype: int64

In [43]:
df.sum(axis=1)

0    1200
1    1500
2    1800
dtype: int64

In [None]:
df.sum(axis='columns')    # Same as axis=1

0    1200
1    1500
2    1800
dtype: int64

In [None]:
df.sum(axis='index')      # Same as axis=0

a     600
b    1500
c    2400
dtype: int64

In [47]:
df.max(axis=0)

a    300
b    600
c    900
dtype: int64

In [48]:
df.max(axis=1)

0    700
1    800
2    900
dtype: int64