### Intro 
Index is an instrument for optimal values searching in objects (Series or DataFrame). It is important value in a DataFrame or Series because all operations are running much faster using indexes.

Pandas has many types of indexes. Each of them is unique and used for optimized searching.

### Index Types
- **Index** - typical index (0, 1, 2...n)
- **Int64Index** - immutable array that consists of 64 bits numbers
- **RangeIndex** - index by default. Has an initial value, last value and step
- **Float64Index** - useses float numbers
- **IntervalIndex** - useses intevals as indexes 
- **CategoricalIndex** - used for categories
- **DatetimeIndex** - used for dates and time, moustly for Time Serieses. Uses 64-bits numbers that makes searching fast
- **PeriodIndex** - used for periods. Similar to **Interval** but uses time
- **MultiIndex** - has several hierarchical levels

### Frequency Aliases
You can find them here: https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#timeseries-offset-aliases

### Pandas Methods for Creating Indexes
- **pd.IntervalIndex.from_breaks( [ interval of values ] )** - returns IntervalIndex 
- **pd.PeriodIndex( [ list of dates ], freq )** - returns Period with provided frequency
- **pd.DatetimeIndex ( datetime value )** - returns DatetimeIndex 
- **pd.to_datetime( 'start_date ', ' end_date ', freq = ' ' )** - create DateTime indexes with provided frequency or creates DateTime type for existing dates

### Index Methods
- **idx.argmax( ) / argmin( )** - returns max/min indexes 
- **idx.day( ) / day_name( )** many options for dealing with days, dates
- **idx.duplicated( )** - returns True / False based on whether an index has occured before or not
- **idx.has_duplicates( )** - quite obviuous
- **idx.is_unique** - checks whether an index is unique
- **idx.sort_index(ascending = True )** - sorts indexes 
- **df.set_index( inplace = True )** sets indexes
- **df.resample(' frequency ' )** resamples a Time Sereis according to provided frequency

To be continued 

In [1]:
import pandas as pd
import numpy as np

In [31]:
# Index
df = pd.DataFrame({'City':['Perm','Moscow'],'Temperature':[20,15]})
print(df.columns)

# Int64Index
df = pd.DataFrame(np.arange(10,21),index=np.arange(0,11))
print(df.index)

# RangeIndex
df = pd.DataFrame(np.arange(11))
print(df.index) # Default Index 

# Float64Index
df = pd.DataFrame(np.arange(11),np.arange(0,11,1.0))
print(df.index)

# IntervalIndex
df = pd.DataFrame({'Col_1':[1,2,3]},index=pd.IntervalIndex.from_breaks([0,0.5,1,1.5]))
print('\n'+str(df.index))

# CategoricalIndex
df = pd.DataFrame({'Col_1':np.arange(6),'Col_2':list('abbbcc')})
df['Col_2'] = df['Col_2'].astype('category')
df = df.set_index('Col_2')
print(df.index)

# DatetimeIndex
dates = pd.date_range('2020/08/01',periods=3,freq='D')
time_series = pd.Series(len(dates),index=dates)
print(time_series.index)

# PeriodIndex 
periods = pd.PeriodIndex(['2020/06/01','2020/07/01','2020/08/01'],freq='M')
print(periods)

Index(['City', 'Temperature'], dtype='object')
Int64Index([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10], dtype='int64')
RangeIndex(start=0, stop=11, step=1)
Float64Index([0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0], dtype='float64')

IntervalIndex([(0.0, 0.5], (0.5, 1.0], (1.0, 1.5]],
              closed='right',
              dtype='interval[float64]')
CategoricalIndex(['a', 'b', 'b', 'b', 'c', 'c'], categories=['a', 'b', 'c'], ordered=False, name='Col_2', dtype='category')
DatetimeIndex(['2020-08-01', '2020-08-02', '2020-08-03'], dtype='datetime64[ns]', freq='D')
PeriodIndex(['2020-06', '2020-07', '2020-08'], dtype='period[M]', freq='M')


### Index Resetting and Reindexing
It is useful when we need to **reset** or **extract** indexes of a DataFrame into Columns.

We can **reindex** a DataFrame for compatibility with a new Index (Align data with old indexes for new indexes)

In [75]:
# Example 
path = 'D:/ML/Books/Learning_Pandas_russian_translation-1-master/Notebooks/Data/sp500.csv'
df = pd.read_csv(path,index_col='Symbol',usecols=[0,2,3,7])
df.head()

Unnamed: 0_level_0,Sector,Price,Book Value
Symbol,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
MMM,Industrials,141.14,26.668
ABT,Health Care,39.6,15.573
ABBV,Health Care,53.95,2.954
ACN,Information Technology,79.79,8.326
ACE,Financials,102.91,86.897


In [38]:
# Let's extract index into column and set a new index (Sector)
extracted_df = df.reset_index()
df = extracted_df.set_index('Sector')
df.head()

Unnamed: 0_level_0,Symbol,Price,Book Value
Sector,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Industrials,MMM,141.14,26.668
Health Care,ABT,39.6,15.573
Health Care,ABBV,53.95,2.954
Information Technology,ACN,79.79,8.326
Financials,ACE,102.91,86.897


In [84]:
# Reindexing allows compatibility with new index (where data alignment is impossible, NaN values apperas)

# Reindexing by rows 
reindexed_df = df.head().reindex(index=['MMM','ACN','NEW'])
reindexed_df

Unnamed: 0_level_0,Sector,Price,Book Value
Symbol,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
MMM,Industrials,141.14,26.668
ACN,Information Technology,79.79,8.326
NEW,,,


In [85]:
# Reindexing by Columns 
reindexed_df = reindexed_df.reindex(columns=['Sector','Price','New_col'])
reindexed_df

Unnamed: 0_level_0,Sector,Price,New_col
Symbol,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
MMM,Industrials,141.14,
ACN,Information Technology,79.79,
NEW,,,


### Hierarchical Indexing
Allows to combine usage of 2 or more indexes for each row. Each index is called a level.
### Some methods
- **df.index.levels[ level_number ]** - returns indexes on provided level
- **df.index.get_level_values( level number )** - returns all values on provided level
- **df.xs( 'level_value_1' ).xs( 'level_value_2' )** - returns values on provided level_value 

In [87]:
# Let's create MutiIndex (Hierarchical Index)
reindexed_df = df.reset_index()
multi_idx_df = reindexed_df.set_index(['Sector','Symbol'])
multi_idx_df.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Price,Book Value
Sector,Symbol,Unnamed: 2_level_1,Unnamed: 3_level_1
Industrials,MMM,141.14,26.668
Health Care,ABT,39.6,15.573
Health Care,ABBV,53.95,2.954
Information Technology,ACN,79.79,8.326
Financials,ACE,102.91,86.897


In [101]:
# Let's have a look at all levels  (has two levels 0 and 1)
print(multi_idx_df.index.levels[0])
print('\n'+str(multi_idx_df.index.get_level_values(0)))

Index(['Consumer Discretionary', 'Consumer Discretionary ', 'Consumer Staples',
       'Consumer Staples ', 'Energy', 'Financials', 'Health Care',
       'Industrials', 'Industries', 'Information Technology', 'Materials',
       'Telecommunications Services', 'Utilities'],
      dtype='object', name='Sector')

Index(['Industrials', 'Health Care', 'Health Care', 'Information Technology',
       'Financials', 'Health Care', 'Information Technology', 'Utilities',
       'Health Care', 'Financials',
       ...
       'Utilities', 'Information Technology', 'Information Technology',
       'Financials', 'Industrials', 'Information Technology',
       'Consumer Discretionary', 'Health Care', 'Financials', 'Health Care'],
      dtype='object', name='Sector', length=500)


In [115]:
# Application of .xs() method
multi_idx_df.xs('Health Care').xs('ABT')

Price         39.600
Book Value    15.573
Name: ABT, dtype: float64

In [154]:
df.Price.duplicated()

Symbol
MMM     False
ABT     False
ABBV    False
ACN     False
ACE     False
        ...  
YHOO    False
YUM     False
ZMH     False
ZION    False
ZTS     False
Name: Price, Length: 500, dtype: bool