# Introduction

pandas is a Python package that provides fast, flexible, and expressive data structures designed to make working with structured (tabular, multidimensional, potentially heterogeneous) and time series data both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, real world data analysis in Python. Additionally, it has the broader goal of becoming the most powerful and flexible open source data analysis / manipulation tool available in any language. It is already well on its way toward this goal.

pandas is well suited for many different kinds of data:

*         Tabular data with heterogeneously-typed columns, as in an SQL table or Excel spreadsheet
*         Ordered and unordered (not necessarily fixed-frequency) time series data.
*         Arbitrary matrix data (homogeneously typed or heterogeneous) with row and column labels
*         Any other form of observational / statistical data sets. The data actually need not be labeled at all to be placed into a pandas data structure

The two primary data structures of pandas, Series (1-dimensional) and DataFrame (2-dimensional), handle the vast majority of typical use cases in finance, statistics, social science, and many areas of engineering. For R users, DataFrame provides everything that R’s data.frame provides and much more. pandas is built on top of NumPy and is intended to integrate well within a scientific computing environment with many other 3rd party libraries.

Here are just a few of the things that pandas does well:

*         Easy handling of missing data (represented as NaN) in floating point as well as non-floating point data
*         Size mutability: columns can be inserted and deleted from DataFrame and higher dimensional objects
*         Automatic and explicit data alignment: objects can be explicitly aligned to a set of labels, or the user can simply ignore the labels and let Series, DataFrame, etc. automatically align the data for you in computations
*         Powerful, flexible group by functionality to perform split-apply-combine operations on data sets, for both aggregating and transforming data
*         Make it easy to convert ragged, differently-indexed data in other Python and NumPy data structures into DataFrame objects
*         Intelligent label-based slicing, fancy indexing, and subsetting of large data sets
*         Intuitive merging and joining data sets
*         Flexible reshaping and pivoting of data sets
*         Hierarchical labeling of axes (possible to have multiple labels per tick)
*         Robust IO tools for loading data from flat files (CSV and delimited), Excel files, databases, and saving / loading data from the ultrafast HDF5 format
*         Time series-specific functionality: date range generation and frequency conversion, moving window statistics, date shifting and lagging.

The aim of this notebook is to provide the most common methods and usage of pandas library. I hope this kernel will be useful for everyone.

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 5GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

In [None]:
data = pd.read_csv("../input/nba2k20-player-dataset/nba2k20-full.csv")

##  `.head()` and `.tail()` Methods

In [None]:
data.head(2) #data.head(n) will show first n rows

In [None]:
data.tail(2) #data.tail(n) will show last n rows 

## Python Built-In Functions

In [None]:
len(data)

In [None]:
type(data)

In [None]:
list(data)

In [None]:
dict(data)

In [None]:
max(data)

In [None]:
min(data)

In [None]:
data.values

In [None]:
data.index

In [None]:
data.shape

In [None]:
data.columns

In [None]:
data.axes

## `in` Keyword

In [None]:
1 in [1,2,3,4,5]

In [None]:
50 in data.index

In [None]:
"Messi" in data.index

## Select Columns from the DataFram

In [None]:
data["team"]
data[["full_name", "jersey"]]

## Add a New Column to the DataFrame

In [None]:
data["League"] = "NBA"
data["Sport"] = "Basketball"

In [None]:
data = pd.read_csv("../input/nba2k20-player-dataset/nba2k20-full.csv")
data.insert(3, column = "Sport", value = "Basketball")

## Basic Math Operations

In [None]:
data = pd.read_csv("../input/nba2k20-player-dataset/nba2k20-full.csv")

In [None]:
data["rating"].add(10)
data["rating"] + 5

data["draft_year"].sub(2)
data["draft_year"] - 2

data["draft_peak"].mul(1)
data["draft_peak"] * 1

data["rating"].div(1)
data["rating"] / 1

## .value_counts() Method

In [None]:
data["position"].value_counts()

## Dropping Null Values

In [None]:
data.dropna(how = "all", inplace = True)

In [None]:
data.dropna(subset = ["position", "country"])

## .astype() Method

In [None]:
data = pd.read_csv("../input/nba2k20-player-dataset/nba2k20-full.csv")

In [None]:
data.info()

In [None]:
data["team"] = data["team"].astype("category")
data["country"] = data["country"].astype("category")
data.info() #memory usage decreased, it is usefull for large datasets

## `.sort_values()` Method

In [None]:
data.sort_values("rating", ascending = False, inplace = True)
data.sort_values(["team", "rating"], ascending=[True, False], inplace = True)
data.head()

## `.set_index()` and `.reset_index()` Methods

In [None]:
data = pd.read_csv("../input/nba2k20-player-dataset/nba2k20-full.csv")

In [None]:
data.set_index("full_name", inplace = True)
data.head(2)

In [None]:
data.reset_index(drop = False, inplace = True)
data.tail(2)

## `.loc[]`

In [None]:
data = pd.read_csv("../input/nba2k20-player-dataset/nba2k20-full.csv", index_col="full_name")

In [None]:
data.sample(3)

In [None]:
data.loc["Kawhi Leonard"]

In [None]:
data.loc["Kawhi Leonard":"Malcolm Brogdon"]

In [None]:
data.loc[:"Malcolm Brogdon"]

In [None]:
data.loc[["Kawhi Leonard", "Malcolm Brogdon"]]

## `.iloc[]`

In [None]:
data = pd.read_csv("../input/nba2k20-player-dataset/nba2k20-full.csv")

In [None]:
data.iloc[3]

In [None]:
data.iloc[[4,10]]

In [None]:
data.iloc[4:10]

In [None]:
data.iloc[:10]

## Filter with One or More than One Condition

In [None]:
# & --> and
# | --> or
data[data["position"] == "F"]

In [None]:
data[(data["team"] == "Los Angeles Lakers")& (data["rating"] > 85)] 

In [None]:
data[(data["country"] != "USA") | (data["rating"] > 85)] 

## `.between()` Method

In [None]:
data[data["rating"].between(75, 80)]

## `.unique()` and `.nunique()` Methods

In [None]:
data["team"].unique()

In [None]:
data["team"].nunique()

## `.apply()` Method with Row Values

In [None]:
# Adding +5 for rating column
[x+5 for x in data["rating"]] # not inplaced, to inplace data["rating"] = 

In [None]:
data = pd.read_csv("../input/nba2k20-player-dataset/nba2k20-full.csv")
def add_plus_five(x):
    return x+5

data["rating"].apply(add_plus_five)

In [None]:
data = pd.read_csv("../input/nba2k20-player-dataset/nba2k20-full.csv")
data["rating"].apply(lambda x: x+5)

## Strings

In [None]:
data = pd.read_csv("../input/nba2k20-player-dataset/nba2k20-full.csv")

## `.upper()` `.lower()` `.title()`

In [None]:
data["team"].str.upper() #LOS ANGELES LAKERS
data["team"].str.lower() #los angeles lakers
data["team"].str.title() #Los Angeles Lakers

## `.str.split()` 

In [None]:
data["full_name"].str.split(" ").str[0]

In [None]:
data["first_name"] = data["full_name"].str.split(" ").str[0]
data["last_name"] = data["full_name"].str.split(" ").str[1]
data.head(3)
data.columns
data = data.reindex(columns = ['full_name', 'first_name', 'last_name', 'rating', 'jersey', 'team', 'position', 'b_day', 'height',
       'weight', 'salary', 'country', 'draft_year', 'draft_round',
       'draft_peak', 'college'])

In [None]:
data.head(3)

In [None]:
data = pd.read_csv("../input/nba2k20-player-dataset/nba2k20-full.csv")

In [None]:
data["salary"].str.replace("$", "").astype("int")

In [None]:
data["height in metres"] = data["height"].str.split("/").str[1].astype(float)

In [None]:
data.head(3)

## To be continued..
## Please leave a feedback to improve 
## Thank you!