# Index and Slicing
“Indexing” means referring to an element of an iterable by its position within the iterable. “Slicing” means getting a subset of elements from an iterable based on their indices.
https://towardsdatascience.com/the-basics-of-indexing-and-slicing-python-lists-2d12c90a94cf

In [15]:
## String Index

## List Comprehension

In [23]:
names = "Dhiraj Upadhyaya Colonel"
print(names)
print(names.index)
print(names.index('Col'))  #Starting Position of word
print(names.index('aj',1,10))  # search aj between 1 and 10th positing 

Dhiraj Upadhyaya Colonel
<built-in method index of str object at 0x0000023810F18B70>
17
4


In [1]:
my_list = [_ for _ in 'abcdefghi']
my_list
#['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i']

['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i']

In [2]:
#To retrieve an element of the list, we use the index operator ([]):
my_list[0]

'a'

Lists are “zero indexed”, so [0] returns the zero-th (i.e. the left-most) item in the list, and [1] returns the one-th item (i.e. one item to the right of the zero-th item). Since there are 9 elements in our list ([0] through [8]), attempting to access my_list[9] throws an IndexError: list index out of range, since it is actually trying to get the tenth element, and there isn’t one.
Python also allows you to index from the end of the list using a negative number, where [-1] returns the last element. This is super-useful since it means you don’t have to programmatically find out the length of the iterable in order to work with elements at the end of it. The indices and reverse indices of my_list are as follows:
 0    1    2    3    4    5    6    7    8
  ↓    ↓    ↓    ↓    ↓    ↓    ↓    ↓    ↓
['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i']
  ↑    ↑    ↑    ↑    ↑    ↑    ↑    ↑    ↑
 -9   -8   -7   -6   -5   -4   -3   -2   -1

## A slice is a subset of list elements. 
In the case of lists, a single slice will always be of contiguous elements. Slice notation takes the form
my_list[start:stop]
where start is the index of the first element to include, and stop is the index of the item to stop at without including it in the slice. So my_list[1:5] returns ['b', 'c', 'd', 'e']:
 0    1    2    3    4    5    6    7    8
  ×    ↓    ↓    ↓    ↓    ×    ×    ×    ×
['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i']
Leaving either slice boundary blank means start from (or go to) the end of the list. 
Using a negative indexer will set the start/stop bounds relative to their position from the end of the list, 
so my_list[-5:-2] returns ['e', 'f', 'g']:

['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i']
  ×    ×    ×    ×    ↑    ↑    ↑    ×    ×
 -9   -8   -7   -6   -5   -4   -3   -2   -1

In [4]:
print(my_list[5:])  #['f', 'g', 'h', 'i']
print(my_list[:4])  #['a', 'b', 'c', 'd']
print(my_list[-5:-2])

['f', 'g', 'h', 'i']
['a', 'b', 'c', 'd']
['e', 'f', 'g']


my_list[-2:-5], you’ll get an empty list. 
in order to be included in the slice, an element must be at or to the right of the start boundary AND to the left of the stop boundary. Because the -2 is already to the right of -5, the slicer stops before populating any value into the slice.
A for loop works exactly the same way; the first loop below has no output, but the second does:

In [9]:
for i in range(-2,-5):    print(i)
print('\n New Line')
for i in range(-5,-2):    print(i, end = '\t')


 New Line
-5	-4	-3	

## Stepping
The slicer can take an optional third argument, which sets the interval at which elements are included in the slice. So my_list[::2] returns ['a', 'c', 'e', 'g', 'i']:

  0    1    2    3    4    5    6    7    8
  ↓    ×¹   ↓²   ×¹   ↓²   ×¹   ↓²   ×¹   ↓²
['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i']
And my_list[1::2] returns ['b', 'd', 'f', 'h']:

  0    1    2    3    4    5    6    7    8
  ×    ↓    ×¹   ↓²   ×¹   ↓²   ×¹   ↓²   ×¹
['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i']
Negative step values reverse the direction in which the slicer iterates through the original list:
The indexed positions of list elements don’t change, but the order in which the elements are returned does. The sense of the start and stop boundaries is also reversed, so the start value should be the right-most position in the slice, and the stop value should be to the left of that.
<----<----<----<----<----<----<----<----<--
  0    1    2    3    4    5    6    7    8
  ×    ×    ×    ×    ↓    ↓    ×    ×    ×
['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i']

['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i']
  ×    ×    ×    ×    ×    ↑    ↑    ↑    ↑
 -9   -8   -7   -6   -5   -4   -3   -2   -1
<----<----<----<----<----<----<----<----<--

In [14]:
print(my_list[::2])
print('\n Next Line')
print(my_list[1::2])
print('\n Next Line')
print(my_list[::-1])
print(my_list[-2:-5:-1]) #gives us ['h', 'g', 'f']

['a', 'c', 'e', 'g', 'i']

 Next Line
['b', 'd', 'f', 'h']

 Next Line
['i', 'h', 'g', 'f', 'e', 'd', 'c', 'b', 'a']
['h', 'g', 'f']


## Pandas Indexing
https://sparkbyexamples.com/pandas/pandas-index-explained-with-examples/
RangeIndex	Index implementing a monotonic integer range.
CategoricalIndex	Index based on an underlying Categorical.
MultiIndex	A multi-level, or hierarchical Index.
IntervalIndex	Immutable index of intervals that are closed on the same side.
DatetimeIndex	ndarray-like of datetime64 data.
TimedeltaIndex	ndarray of timedelta64 data, represented internally as int64
PeriodIndex	ndarray holding ordinal values indicating regular periods in time.
NumericIndex	Index of numpy int/uint/float data.

### Create Index
- Syntax of Index() constructor.
class pandas.Index(data=None, dtype=None, copy=False, name=None, tupleize_cols=True, **kwargs)
data – list of data you preffered to have on Index.
dtype – NumPy suppoted data type. When it is None, it uses best type s per the data.
copy – bool type. Make a copy of input ndarray
name – Name of the Index.
tupleize_cols – When True, attempt to create a MultiIndex if possible
       
By default, the Series is created with a default Index starting from zero and incrementing by 1. Series can be created through its constructor and takes the values as an argument.

In [25]:
import pandas as pd

In [26]:
s=pd.Series(['A','B','C','D','E'])
print(s)  #start at 0, increment by 1 till number of elements in the series

0    A
1    B
2    C
3    D
4    E
dtype: object


In [27]:
# Custom Index
idx= ['idx1','idx2','idx3','idx4','idx5']
s=pd.Series(['A','B','C','D','E'],index=idx)
print(s)

idx1    A
idx2    B
idx3    C
idx4    D
idx5    E
dtype: object


In [28]:
# RangeIndex
idx=pd.RangeIndex(5,10)
s=pd.Series(['A','B','C','D','E'],index=idx)
print(s)

5    A
6    B
7    C
8    D
9    E
dtype: object


### DF with index
when we dont not give labels to columns and rows(index), DataFrame by default assigns incremental sequence numbers as labels to both rows and columns called Index.
Column names with sequence numbers don’t make sense as it’s hard to identify what data holds on each column hence, it is always best practice to provide column names that identify the data it holds. Use column param and index param to provide column & row labels respectively to the DataFrame.

pandas DataFrame index as a list of values, you can do this by using df.index.values. 
Note that df.index returns a Series object.

In [31]:
# Create pandas DataFrame from List
import pandas as pd
technologies = [ ["rprogramming",20000, "30days"], 
                 ["python",20000, "40days"], 
               ]
df=pd.DataFrame(technologies)
print(df)

              0      1       2
0  rprogramming  20000  30days
1        python  20000  40days


In [32]:
#Add Column & Row Labels to the DataFrame
column_names=["Courses","Fee","Duration"]
row_label=["a","b"]
df=pd.DataFrame(technologies,columns=column_names,index=row_label)
print(df)

        Courses    Fee Duration
a  rprogramming  20000   30days
b        python  20000   40days


In [33]:
# Get Index as Series
print(df.index)   #
# Outputs
# RangeIndex(start=0, stop=3, step=1)

# Get Index as List
print(df.index.values)
# Outputs
# [0 1 2]

Index(['a', 'b'], dtype='object')
['a' 'b']


In [41]:
# Get Row by Index.
print(df.iloc[1:])

  Courses    Fee Duration
b  python  20000   40days


In [42]:
df.drop(index=df.iloc[1].name)  #Index & Name property

Unnamed: 0,Courses,Fee,Duration
a,rprogramming,20000,30days


In [43]:
df.drop('a')

Unnamed: 0,Courses,Fee,Duration
b,python,20000,40days


In [46]:
df.drop(df.index)

Unnamed: 0,Courses,Fee,Duration
a,rprogramming,20000,30days
