# Pandas Overview
- The pandas library provides high-performance, easy-to-use data structures and data analysis tools. The main data structure is the DataFrame, which you can think of as an in-memory 2D table (like a spreadsheet, with column names and row labels).

- Many features available in Excel are available programmatically, such as creating pivot tables, computing columns based on other columns, plotting graphs, etc. You can also group rows by column value, or join tables much like in SQL. Pandas is also great at handling time series data.

In [1]:
import pandas as pd

## Pandas is Column-major
An important thing to know about pandas is that it is column-major, which explains many of its quirks.

- Column-major means consecutive elements in a column are stored next to each other in memory. Row-major means the same but for elements in a row. Because modern computers process sequential data more efficiently than non-sequential data, if a table is row-major, accessing its rows will be much faster than accessing its columns.
- In NumPy, major order can be specified. When a ndarray is created, it’s row-major by default if you don’t specify the order.
- Like R’s Data Frame, pandas’ DataFrame is column-major. People coming to pandas from NumPy tend to treat DataFrame the way they would ndarray, e.g. trying to access data by rows, and find DataFrame slow.
- Note: A column in a DataFrame is a Series. You can think of a DataFrame as a bunch of Series being stored next to each other in memory.
- For our dataset, accessing a row takes about 50x longer than accessing a column in our DataFrame.

In [2]:
df = pd.read_csv("data/interviews.csv")

print(df.shape)
df.head()

(17654, 10)


Unnamed: 0,Company,Title,Job,Level,Date,Upvotes,Offer,Experience,Difficulty,Review
0,Apple,Software Engineer,Software Engineer,Engineer,"Aug 7, 2019",0,No offer,0.0,Medium,Application I applied through a staffing agen...
1,Apple,Software Engineer,Software Engineer,Engineer,"Aug 8, 2019",0,Accepted offer,1.0,Hard,Application I applied online. The process too...
2,Apple,Software Engineer,Software Engineer,Engineer,,0,Declined offer,0.0,Medium,Application The process took 4 weeks. I inter...
3,Apple,Software Engineer,Software Engineer,Engineer,,9,Declined offer,-1.0,Medium,Application The process took a week. I interv...
4,Apple,Software Engineer,Software Engineer,Engineer,"May 29, 2009",2,No offer,0.0,Medium,Application I applied through an employee ref...


In [3]:
# Get the column `date`, 1000 loops
%timeit -n1000 df["Date"]

# Get the first row, 1000 loops
%timeit -n1000 df.iloc[0]

1.39 µs ± 110 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
107 µs ± 2.49 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
