# BSE Trend analysis using Numpy

### Step 1 : Scope the project & gather data

#### Scope

Trend analysis on 1Yr worth of BSE data

#### Installs

#### Imports

In [1]:
# Default Installs
import os
import glob
import pandas as pd
import numpy as np
import datetime as dt
import json

In [2]:
# Manual Installs
# By Eelco Hoogendoorn
import numpy_indexed as npi
from itertools import compress 
from tabulate import tabulate

##### Environments

In [3]:
print('getcwd : ', os.getcwd())

getcwd :  D:\BigData\12. Python\4. Jupyter Notebooks


##### Pandas options

In [4]:
pd.set_option('display.max_columns', None)
pd.set_option('display.max_colwidth', 200)

# for all rows set to None
# pd.set_option('display.max_rows', None) 

pd.set_option('display.max_rows', 100)

#### Describe and gather data

##### Things to remember
* Data from mySQL should be a whole weeks data not any partial in between data

In [5]:
%%time
ts_parser = lambda x: pd.datetime.strptime(x, "%Y-%m-%d %H:%M:%S")

bse_csv_columns = ["ts", "sc_code", "sc_name", "sc_group", "sc_type"
                 , "open", "high", "low", "close", "last", "prevclose"
                 , "no_trades", "no_of_shrs", "net_turnover", "tdcloindi", "isin"]

df_bse_daily = pd.read_csv(os.path.join(os.getcwd(), '..', '5. BTD', 'data', 'bse_daily_365d.csv'), sep='|'
                           , names=bse_csv_columns
                          ,skip_blank_lines=True
#                          ,parse_dates=['ts'], date_parser = ts_parser)
                          ,parse_dates=['ts'])

Wall time: 9.36 s


In [6]:
print(df_bse_daily.shape)

(657058, 16)


#### Python functions

### Step 2 : Explore and assess the data

In [109]:
df_bse_daily.groupby(['sc_group'])['sc_group'].count()

sc_group
A     114330
B     239492
E       2667
F       1150
IF       669
M      15685
MS        66
MT       564
P       1030
R         18
T      30339
TS        33
X     177775
XT     62820
Z      10391
ZP        29
Name: sc_group, dtype: int64

In [7]:
df_temp = df_bse_daily[df_bse_daily.sc_code == 500002][['ts', 'sc_code', 'sc_name', 'high', 'low', 'close', 'prevclose', 'no_of_shrs']]

In [8]:
df_temp.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 245 entries, 0 to 654560
Data columns (total 8 columns):
ts            245 non-null datetime64[ns]
sc_code       245 non-null int64
sc_name       245 non-null object
high          245 non-null float64
low           245 non-null float64
close         245 non-null float64
prevclose     245 non-null float64
no_of_shrs    245 non-null int64
dtypes: datetime64[ns](1), float64(4), int64(2), object(1)
memory usage: 16.3+ KB


In [9]:
df_temp.head()

Unnamed: 0,ts,sc_code,sc_name,high,low,close,prevclose,no_of_shrs
0,2019-04-01,500002,ABB LTD.,1326.6,1306.35,1313.55,1319.85,7785
2791,2019-04-02,500002,ABB LTD.,1342.25,1314.0,1322.6,1313.55,4482
5519,2019-04-03,500002,ABB LTD.,1339.95,1319.7,1329.0,1322.6,27791
8294,2019-04-04,500002,ABB LTD.,1336.1,1317.3,1326.0,1329.0,3106
11005,2019-04-05,500002,ABB LTD.,1379.0,1320.3,1374.75,1326.0,12887


**Observations**
* column no_of_shrs needs to be renamed. Changing it to volumes
* Adding calculated columns like change(close- prevclose), change percent( ((close/prevclose)*100) - 100)
* Since i am calculating weekly analysis, adding a new column called yearWeek which contain year and week number

In [10]:
df_temp.rename(columns={"no_of_shrs": "volumes"}, inplace=True)

In [11]:
df_temp.columns

Index(['ts', 'sc_code', 'sc_name', 'high', 'low', 'close', 'prevclose',
       'volumes'],
      dtype='object')

In [12]:
# Last week of year comes as 0 because that week is also beginning of next year
#df_temp['yearWeek'] = df_temp['ts'].dt.year * 100 + df_temp['ts'].dt.week+ 0

# This resolves above problem
df_temp['yearWeek'] = df_temp['ts'].dt.year * 100 + df_temp['ts'].dt.strftime('%U').astype(int)

In [13]:
# Change ts format from datetime64[ns] to object and truncating the time values
# in a way avoid error in weekly_trend_analysis() in last traded date section of code
df_temp['ts'] = df_temp['ts'] .dt.strftime('%Y-%m-%d')
#df_temp['ts'] = df_temp['ts'] .dt.strftime('%Y%m%d')

In [14]:
# Change & change percent
df_temp['chng'] = (df_temp['close'] - df_temp['prevclose'])
df_temp['chngp'] = np.round( ( (df_temp['close'] / df_temp['prevclose']) * 100 ) - 100, 2)

##### Analysis data & Indicators

Below are the possible indicators, i can think of,
1. closeH - high close in the week
1. closeL - low closein the week
1. volHigh - Highest volume in the week
1. volAvg - Volume average
1. daysTraded - Number of days traded in the week
1. HSDL - Highest Single Day Loss
1. HSDG - Highest Single Day Gain
1. HSDLp - Highest Single Day Loss percent
1. HSDGp - Highest Single Day Gain percent
1. first - First close of the week
1. last - Last close of he week
1. wChng - Week change
1. wChngp - Week change percent
1. lastTrdDoW - Last traded day of week
1. TI - Times increased
1. volAvgWOhv - Volume average without high volume
1. HVdAV - High volume / Average volume(without highvolume)
1. CPveoHVD - Close positive on high volume day
1. lastDVotWk - Last day volume
1. lastDVdAV - Last day volume / average volume

**Converting df columns to numpy array**

In [15]:
arr_yearWeek = df_temp['yearWeek'].to_numpy()
arr_close = df_temp['close'].to_numpy()
arr_prevclose = df_temp['prevclose'].to_numpy()
arr_chng = df_temp['chng'].to_numpy()
arr_chngp = df_temp['chngp'].to_numpy()
arr_ts = df_temp['ts'].to_numpy()
arr_volumes = df_temp['volumes'].to_numpy()

**Calculating max(close) & min(close)**

**Pandas**

In [16]:
%%time
a = df_temp[['yearWeek', 'close']].to_numpy()
b = npi.group_by(a[:, 0]).split(a[:, 1])

Wall time: 10.8 ms


In [21]:
a[:][:5]

array([[201913.  ,   1313.55],
       [201913.  ,   1322.6 ],
       [201913.  ,   1329.  ],
       [201913.  ,   1326.  ],
       [201913.  ,   1374.75]])

In [23]:
b[:][:5]

[array([1313.55, 1322.6 , 1329.  , 1326.  , 1374.75]),
 array([1398.2 , 1408.4 , 1379.05, 1403.4 , 1426.2 ]),
 array([1421.4, 1408.6, 1432.3]),
 array([1433.05, 1488.45, 1484.45, 1493.9 , 1477.5 ]),
 array([1473.4 , 1473.9 , 1490.55])]

**Numpy**

In [24]:
%%time
arr_concat = np.column_stack((arr_yearWeek, arr_close))
npi_gb = npi.group_by(arr_concat[:, 0]).split(arr_concat[:, 1])

Wall time: 1 ms


In [26]:
arr_concat[:][:5]

array([[201913.  ,   1313.55],
       [201913.  ,   1322.6 ],
       [201913.  ,   1329.  ],
       [201913.  ,   1326.  ],
       [201913.  ,   1374.75]])

In [27]:
npi_gb[:][:5]

[array([1313.55, 1322.6 , 1329.  , 1326.  , 1374.75]),
 array([1398.2 , 1408.4 , 1379.05, 1403.4 , 1426.2 ]),
 array([1421.4, 1408.6, 1432.3]),
 array([1433.05, 1488.45, 1484.45, 1493.9 , 1477.5 ]),
 array([1473.4 , 1473.9 , 1490.55])]

**Observations**    
* Comparing above Pandas vs. Numpy. Numpy is faster and results look similar. So proceeding with Numpy for similar requirement

In [31]:
%%time
arr_concat = np.column_stack((arr_yearWeek, arr_close))
npi_gb = npi.group_by(arr_concat[:, 0]).split(arr_concat[:, 1])

cmaxs, cmins = [], []
first, last, wChng, wChngp = [], [], [], []
for idx,subarr in enumerate(npi_gb):
    cmaxs.append( np.amax(subarr) )
    cmins.append( np.amin(subarr) )

Wall time: 4.93 ms


In [33]:
print(cmaxs)
print(cmins)

[1374.75, 1426.2, 1432.3, 1493.9, 1490.55, 1481.3, 1392.05, 1494.65, 1610.7, 1591.85, 1566.7, 1566.0, 1605.45, 1653.2, 1533.05, 1518.35, 1403.0, 1394.5, 1397.7, 1378.35, 1375.05, 1400.05, 1331.15, 1349.25, 1413.3, 1573.4, 1521.0, 1486.85, 1552.65, 1540.05, 1517.05, 1485.75, 1491.05, 1471.4, 1446.1, 1479.45, 1486.2, 1479.85, 1275.75, 1287.35, 1321.45, 1367.25, 1377.1, 1321.8, 1341.35, 1338.45, 1340.05, 1228.9, 1200.05, 1193.9, 1126.65, 964.65, 870.1]
[1313.55, 1379.05, 1408.6, 1433.05, 1473.4, 1376.95, 1346.55, 1426.0, 1571.0, 1527.6, 1509.8, 1531.8, 1558.9, 1543.35, 1484.85, 1414.5, 1361.1, 1350.55, 1381.3, 1349.95, 1339.1, 1333.0, 1316.05, 1324.3, 1322.15, 1460.5, 1460.4, 1464.5, 1473.65, 1496.1, 1448.35, 1448.5, 1457.05, 1412.9, 1407.25, 1439.6, 1449.7, 1266.4, 1255.15, 1284.1, 1287.0, 1286.7, 1303.75, 1291.8, 1281.45, 1294.25, 1234.7, 1207.4, 1184.55, 1163.6, 1011.3, 893.0, 827.9]


**All metrics are calculated in this single cell**

In [36]:
%%time
# References : # https://stackoverflow.com/questions/38013778/is-there-any-numpy-group-by-function

arr_yearWeek = df_temp['yearWeek'].to_numpy()
arr_close = df_temp['close'].to_numpy()
arr_prevclose = df_temp['prevclose'].to_numpy()
arr_chng = df_temp['chng'].to_numpy()
arr_chngp = df_temp['chngp'].to_numpy()
arr_ts = df_temp['ts'].to_numpy()
arr_volumes = df_temp['volumes'].to_numpy()

# Close
arr_concat = np.column_stack((arr_yearWeek, arr_close))
npi_gb = npi.group_by(arr_concat[:, 0]).split(arr_concat[:, 1])

cmaxs, cmins = [], []
first, last, wChng, wChngp = [], [], [], []
for idx,subarr in enumerate(npi_gb):
    cmaxs.append( np.amax(subarr) )
    cmins.append( np.amin(subarr) )
    first.append(subarr[0])
    last.append(subarr[-1])
    wChng.append( subarr[-1] - subarr[0] )
    wChngp.append( ( (subarr[-1] / subarr[0]) * 100) - 100 )

    
yearWeek, daysTraded = np.unique(arr_concat[:,0], return_counts=True)

npi_gb = None
arr_concat = None

# Chng
#e = df_temp[['yearWeek', 'chng']].to_numpy()
#f = npi.group_by(e[:, 0]).split(e[:, 1])
arr_concat = np.column_stack((arr_yearWeek, arr_chng))
npi_gb = npi.group_by(arr_concat[:, 0]).split(arr_concat[:, 1])

HSDL, HSDG = [], []
for idx,subarr in enumerate(npi_gb):
    HSDL.append( np.amin(subarr) )
    HSDG.append( np.amax(subarr) )

npi_gb = None
arr_concat = None

# Chngp
#g = df_temp[['yearWeek', 'chngp']].to_numpy()
#h = npi.group_by(g[:, 0]).split(g[:, 1])
arr_concat = np.column_stack((arr_yearWeek, arr_chngp))
npi_gb = npi.group_by(arr_concat[:, 0]).split(arr_concat[:, 1])

HSDLp, HSDGp = [], []
for idx,subarr in enumerate(npi_gb):
    HSDLp.append( np.amin(subarr) )
    HSDGp.append( np.amax(subarr) )

npi_gb = None
arr_concat = None

# Last Traded Date of the Week
# Resolved "TypeError: invalid type promotion" by convertign 'ts' to date string. 
# Getting issues when concatenating arr_yearWeek(int64) with arr_ts(timestamp64[ns])
# Alternate solutions if ts was still a datetime64[n] here we can try to convert arr_ts.tolist() or arr_ts.astype(object)
# Commenting out alternative solution
#i = df_temp[['yearWeek', 'ts']].to_numpy()
#j = npi.group_by(i[:, 0]).split(i[:, 1])
arr_concat = np.column_stack((arr_yearWeek, arr_ts))
npi_gb = npi.group_by(arr_concat[:, 0]).split(arr_concat[:, 1])

lastTrdDoW = []
for idx,subarr in enumerate(npi_gb):
    lastTrdDoW.append( subarr[-1] )

npi_gb = None
arr_concat = None

# Times inreased
#yearWeek = df_temp['yearWeek'].to_numpy()
#close = df_temp['close'].to_numpy()
#prevclose = df_temp['prevclose'].to_numpy()
TI = np.where(arr_close > arr_prevclose, 1, 0)

# Below npi_gb_yearWeekTI is used in volumes section
arr_concat = np.column_stack((arr_yearWeek, TI))
npi_gb_yearWeekTI = npi.group_by(arr_concat[:, 0]).split(arr_concat[:, 1])

#TI = []
#for idx,subarr in enumerate(t_group):
#    TI.append( np.sum(subarr) )
#Above 3 lines replaced with below
tempArr, TI = npi.group_by(arr_yearWeek).sum(TI)


# Volume ( dependent on above section value t_group , thats the reason to move from top to here)
arr_concat = np.column_stack((arr_yearWeek, arr_volumes))
npi_gb = npi.group_by(arr_concat[:, 0]).split(arr_concat[:, 1])

#c = df_temp[['yearWeek', 'volumes']].to_numpy()
#d = npi.group_by(c[:, 0]).split(c[:, 1])

vmaxs, vavgs, volAvgWOhv, HVdAV, CPveoHVD, lastDVotWk, lastDVdAV = [], [], [], [], [], [], []
for idx,subarr in enumerate(npi_gb):
    vavgs.append( np.mean(subarr) )
    ldvotWk = subarr[-1]
    lastDVotWk.append(ldvotWk)
        
    #print(idx, 'O - ',subarr, np.argmax(subarr), ', average : ',np.mean(subarr))
    ixDel = np.argmax(subarr)
    hV = subarr[ixDel]
    vmaxs.append( hV )
    
    subarr = np.delete(subarr, ixDel)
    vawoHV = np.mean(subarr)
    volAvgWOhv.append( vawoHV )
    HVdAV.append(hV / vawoHV)
    CPveoHVD.append( npi_gb_yearWeekTI[idx][ixDel] )
    lastDVdAV.append(ldvotWk / vawoHV)    
    #print(idx, 'N - ',subarr, ', average : ',vawoHV, 'hV : ', hV, 'HVdAV : ',hV / vawoHV 
    #      , ', CPveoHVD : ', npi_gb_yearWeekTI[idx][ixDel], 'lastDVoTWK : ',ldvotWk, 'lastDVdAV : ', ldvotWk / vawoHV)
    
npi_gb = None
arr_concat = None

Wall time: 17.9 ms


In [43]:
# Different datatypes are going to be stored in numpy. BadPractice(i guess)
print(type(arr_yearWeek))
print(type(arr_yearWeek[0]))
print(type(arr_ts[0]))
print(type( list(arr_ts) ))
print(arr_yearWeek.shape)

<class 'numpy.ndarray'>
<class 'numpy.int64'>
<class 'str'>
<class 'list'>
(245,)


### Numpy Tests

**column_stack**

In [44]:
np_test = np.column_stack((yearWeek, first, last))
df_test = pd.DataFrame(data=np_test)
df_test.head()

Unnamed: 0,0,1,2
0,201913.0,1313.55,1374.75
1,201914.0,1398.2,1426.2
2,201915.0,1421.4,1432.3
3,201916.0,1433.05,1477.5
4,201917.0,1473.4,1490.55


**Compress() - Testing**

In [49]:
a = np.array(['abc', 'def', 'ghi', 'abc', 'def', 'ghi', 'abc'])
print(type(a))
print(a)
b = np.array(['abc'])

<class 'numpy.ndarray'>
['abc' 'def' 'ghi' 'abc' 'def' 'ghi' 'abc']


In [50]:
tf_1 = np.isin(a, b) 
print(tf_1)
t_result = list(compress(range(len(tf_1)), tf_1)) 
#print(t_result)
weekly_all = a[t_result]
weekly_all

[ True False False  True False False  True]


array(['abc', 'abc', 'abc'], dtype='<U3')

**Numpy array slice testing**

In [52]:
# generating Test data (rows, columns)
x = np.arange(35).reshape(7, 5)
print(x.shape)
print(x)
a = x[:,0]
b = [15, 25]
print('a = ',a)
print('b = ',b)

(7, 5)
[[ 0  1  2  3  4]
 [ 5  6  7  8  9]
 [10 11 12 13 14]
 [15 16 17 18 19]
 [20 21 22 23 24]
 [25 26 27 28 29]
 [30 31 32 33 34]]
a =  [ 0  5 10 15 20 25 30]
b =  [15, 25]


In [172]:
print('ISIN = ', np.isin(a, b) )
print('ISIN(invert) = ', np.isin(a, b, invert=True) )
tf_1 = np.isin(a, b, invert=True) 
t_result = list(compress(range(len(tf_1)), tf_1)) 
print('Compress index = ',t_result)
print(a[t_result])
print(x[t_result])

ISIN =  [False False False  True False  True False]
ISIN(invert) =  [ True  True  True False  True False  True]
Compress index =  [0, 1, 2, 4, 6]
[ 0  5 10 20 30]
[[ 0  1  2  3  4]
 [ 5  6  7  8  9]
 [10 11 12 13 14]
 [20 21 22 23 24]
 [30 31 32 33 34]]


In [169]:
t_result = [True, False, True, False, True, False, True ]
x[t_result]

array([[ 0,  1,  2,  3,  4],
       [10, 11, 12, 13, 14],
       [20, 21, 22, 23, 24],
       [30, 31, 32, 33, 34]])

**Vertical Stack**

In [55]:
vs = []

sc_code = np.full(5, '500002')
symbol = np.full(5, 'ABB')
np_test = np.column_stack((sc_code, symbol))
vs = np.vstack((np_test))    
print(vs)
sc_code = np.full(5, '500003')
symbol = np.full(5, 'ZZZ')
np_test = np.column_stack((sc_code, symbol))
vs = np.vstack((vs, np_test))    
print(vs)

[['500002' 'ABB']
 ['500002' 'ABB']
 ['500002' 'ABB']
 ['500002' 'ABB']
 ['500002' 'ABB']]
[['500002' 'ABB']
 ['500002' 'ABB']
 ['500002' 'ABB']
 ['500002' 'ABB']
 ['500002' 'ABB']
 ['500003' 'ZZZ']
 ['500003' 'ZZZ']
 ['500003' 'ZZZ']
 ['500003' 'ZZZ']
 ['500003' 'ZZZ']]


In [56]:
df_test = pd.DataFrame(data=vs)
df_test

Unnamed: 0,0,1
0,500002,ABB
1,500002,ABB
2,500002,ABB
3,500002,ABB
4,500002,ABB
5,500003,ZZZ
6,500003,ZZZ
7,500003,ZZZ
8,500003,ZZZ
9,500003,ZZZ


**Below method is used to remove latest yearWeek from history, so later latest yearWeek is appended to history**

In [48]:
a = np.full(5, '201901')
a = np.append(a, np.full(5, '201902'))
b = np.full(5, '201902')
b = np.append(b, np.full(5, '201903'))
print('a = ',a)
print('b = ',b)

set_a = set(a)
set_b = set(b)
print('Set difference(Show yearWeek which b doesnt have) = ',set_a.difference(set_b))

print('\n what items in a is avialble in b')
tf_1 = np.isin(a, b) 
print('np.isin = ',tf_1)
tf_1 = np.isin(a, b, invert=True) 
print('np.isin(invert) = ',tf_1)

t_result = list(compress(range(len(tf_1)), tf_1)) 
print('Compress indexes(get indexes of true items) = ',t_result)
print('Final Result = ',a[t_result])

a =  ['201901' '201901' '201901' '201901' '201901' '201902' '201902' '201902'
 '201902' '201902']
b =  ['201902' '201902' '201902' '201902' '201902' '201903' '201903' '201903'
 '201903' '201903']
Set difference(Show yearWeek which b doesnt have) =  {'201901'}

 what items in a is avialble in b
np.isin =  [False False False False False  True  True  True  True  True]
np.isin(invert) =  [ True  True  True  True  True False False False False False]
Compress indexes(get indexes of true items) =  [0, 1, 2, 3, 4]
Final Result =  ['201901' '201901' '201901' '201901' '201901']


**Joining all numpy arrays to dataframe**

In [57]:
%%time
# yearWeek and occurances 
#yearWeek, daysTraded = np.unique(a[:,0], return_counts=True)
yearWeek = yearWeek.astype(int)
HSDL = np.round(HSDL,2)
HSDG = np.round(HSDG,2)
HSDLp = np.round(HSDLp,2)
HSDGp = np.round(HSDGp,2)

first = np.round(first,2)
last = np.round(last,2)
wChng = np.round(wChng,2)
wChngp = np.round(wChngp,2)

vavgs = np.array(vavgs).astype(int)
volAvgWOhv = np.array(volAvgWOhv).astype(int)
HVdAV = np.round(HVdAV,2)

#dict_temp = {'yearWeek': yearWeek, 'closeH': cmaxs, 'closeL': cmins, 'volHigh':vmaxs, 'volAvg':vavgs, 'daysTraded':daysTraded
#            ,'HSDL':HSDL, 'HSDG':HSDG, 'HSDLp':HSDLp, 'HSDGp':HSDGp, 'first':first, 'last':last, 'wChng':wChng, 'wChngp':wChngp
#            ,'lastTrdDoW':lastTrdDoW, 'TI':TI, 'volAvgWOhv':volAvgWOhv, 'HVdAV':HVdAV, 'CPveoHVD':CPveoHVD
#            ,'lastDVotWk':lastDVotWk, 'lastDVdAV':lastDVdAV}
#df_weekly_new = pd.DataFrame(data=dict_temp)

cols = ['yearWeek', 'lastTrdDoW', 'daysTraded', 'closeL', 'closeH', 'volAvg', 'volHigh'
             , 'HSDL', 'HSDG', 'HSDLp', 'HSDGp', 'first', 'last', 'wChng', 'wChngp', 'TI', 'volAvgWOhv', 'HVdAV'
             , 'CPveoHVD', 'lastDVotWk', 'lastDVdAV']

ticker = np.full(yearWeek.shape[0], df_temp['sc_code'].to_numpy()[0])
np_weekly_new = np.column_stack((yearWeek, lastTrdDoW, daysTraded, cmins, cmaxs, vavgs, vmaxs, HSDL
                               , HSDG, HSDLp, HSDGp, first, last, wChng, wChngp, TI, volAvgWOhv, HVdAV
                               , CPveoHVD, lastDVotWk, lastDVdAV))
df_weekly_new = pd.DataFrame(data=np_weekly_new, columns=cols)

df_weekly_new.head(10)

Wall time: 8 ms


Unnamed: 0,yearWeek,lastTrdDoW,daysTraded,closeL,closeH,volAvg,volHigh,HSDL,HSDG,HSDLp,HSDGp,first,last,wChng,wChngp,TI,volAvgWOhv,HVdAV,CPveoHVD,lastDVotWk,lastDVdAV
0,201913,2019-04-05,5,1313.55,1374.75,11210,27791,-6.3,48.75,-0.48,3.68,1313.55,1374.75,61.2,4.66,3,7065,3.93,1,12887,1.8240622788393488
1,201914,2019-04-12,5,1379.05,1426.2,16741,34789,-29.35,24.35,-2.08,1.77,1398.2,1426.2,28.0,2.0,4,12229,2.84,1,6928,0.5665106200298464
2,201915,2019-04-18,3,1408.6,1432.3,9008,21127,-12.8,23.7,-0.9,1.68,1421.4,1432.3,10.9,0.77,1,2948,7.17,1,21127,7.165338307614041
3,201916,2019-04-26,5,1433.05,1493.9,23703,50389,-16.4,55.4,-1.1,3.87,1433.05,1477.5,44.45,3.1,3,17032,2.96,1,23219,1.3632573978393612
4,201917,2019-05-03,3,1473.4,1490.55,13102,27577,-4.1,16.65,-0.28,1.13,1473.4,1490.55,17.15,1.16,2,5865,4.7,0,4414,0.7525360156849373
5,201918,2019-05-10,5,1376.95,1481.3,19429,47397,-56.65,-9.25,-3.91,-0.62,1481.3,1376.95,-104.35,-7.04,0,12437,3.81,0,12120,0.9745115381522876
6,201919,2019-05-17,5,1346.55,1392.05,7935,31284,-30.4,24.3,-2.21,1.8,1346.55,1392.05,45.5,3.38,4,2098,14.91,1,2259,1.0763549731983324
7,201920,2019-05-24,5,1426.0,1494.65,10970,37266,-22.85,65.3,-1.57,4.57,1431.9,1494.65,62.75,4.38,3,4397,8.48,1,37266,8.47532408460314
8,201921,2019-05-31,5,1571.0,1610.7,32984,119315,-20.8,116.05,-1.29,7.76,1610.7,1571.0,-39.7,-2.46,1,11401,10.47,0,119315,10.465080583269378
9,201922,2019-06-07,4,1527.6,1591.85,10052,21913,-32.6,36.75,-2.09,2.41,1591.85,1564.35,-27.5,-1.73,2,6099,3.59,0,8289,1.3590752582390555


In [58]:
np_weekly_new.dtype

dtype('<U32')

In [59]:
np_weekly_new

array([['201913', '2019-04-05', '5', ..., '1', '12887',
        '1.8240622788393488'],
       ['201914', '2019-04-12', '5', ..., '1', '6928',
        '0.5665106200298464'],
       ['201915', '2019-04-18', '3', ..., '1', '21127',
        '7.165338307614041'],
       ...,
       ['202010', '2020-03-13', '4', ..., '0', '14051',
        '2.1394203928335784'],
       ['202011', '2020-03-20', '5', ..., '0', '5207',
        '1.1244398855476974'],
       ['202012', '2020-03-27', '5', ..., '0', '5204',
        '1.2823261257931375']], dtype='<U32')

**Faster way to read a cell in pandas**

In [71]:
%%time
#print(df_bse_daily.shape)
print(df_bse_daily.iloc[0]['sc_code'])

500002
Wall time: 13.2 ms


In [78]:
%%time
df_bse_daily.head(1)['sc_code'].values[0]

Wall time: 2 ms


500002

In [86]:
%%time
df_bse_daily.at[0, 'sc_code']

Wall time: 1 ms


500002

In [84]:
%%time
df_bse_daily['sc_code'].to_numpy()[0]

Wall time: 0 ns


500002

In [117]:
ticker = df_temp['sc_code'].to_numpy()[0]
df_weekly_new['sc_code'] = ticker

In [60]:
df_weekly_new.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 53 entries, 0 to 52
Data columns (total 21 columns):
yearWeek      53 non-null object
lastTrdDoW    53 non-null object
daysTraded    53 non-null object
closeL        53 non-null object
closeH        53 non-null object
volAvg        53 non-null object
volHigh       53 non-null object
HSDL          53 non-null object
HSDG          53 non-null object
HSDLp         53 non-null object
HSDGp         53 non-null object
first         53 non-null object
last          53 non-null object
wChng         53 non-null object
wChngp        53 non-null object
TI            53 non-null object
volAvgWOhv    53 non-null object
HVdAV         53 non-null object
CPveoHVD      53 non-null object
lastDVotWk    53 non-null object
lastDVdAV     53 non-null object
dtypes: object(21)
memory usage: 4.4+ KB


##### Reorder columns

In [101]:
cols = df_weekly_new.columns.tolist()
cols

['CPveoHVD',
 'HSDG',
 'HSDGp',
 'HSDL',
 'HSDLp',
 'HVdAV',
 'TI',
 'closeH',
 'closeL',
 'daysTraded',
 'first',
 'last',
 'lastDVdAV',
 'lastDVotWk',
 'lastTrdDoW',
 'volAvg',
 'volAvgWOhv',
 'volHigh',
 'wChng',
 'wChngp',
 'yearWeek']

In [120]:
cols2 = ['sc_code', 'yearWeek', 'lastTrdDoW', 'daysTraded', 'closeL', 'closeH', 'volAvg', 'volHigh'
         , 'HSDL', 'HSDG', 'HSDLp', 'HSDGp', 'first', 'last', 'wChng', 'wChngp', 'TI', 'volAvgWOhv', 'HVdAV'
         , 'CPveoHVD', 'lastDVotWk', 'lastDVdAV']

In [121]:
df_weekly_new = df_weekly_new[cols2].copy()

In [122]:
df_weekly_new.head()

Unnamed: 0,sc_code,yearWeek,lastTrdDoW,daysTraded,closeL,closeH,volAvg,volHigh,HSDL,HSDG,HSDLp,HSDGp,first,last,wChng,wChngp,TI,volAvgWOhv,HVdAV,CPveoHVD,lastDVotWk,lastDVdAV
0,500002,201913,2019-04-05,5,1313.55,1374.75,11210,27791,-6.3,48.75,-0.48,3.68,1313.55,1374.75,61.2,4.66,3,7065,3.93,1,12887,1.824062
1,500002,201914,2019-04-12,5,1379.05,1426.2,16741,34789,-29.35,24.35,-2.08,1.77,1398.2,1426.2,28.0,2.0,4,12229,2.84,1,6928,0.566511
2,500002,201915,2019-04-18,3,1408.6,1432.3,9008,21127,-12.8,23.7,-0.9,1.68,1421.4,1432.3,10.9,0.77,1,2948,7.17,1,21127,7.165338
3,500002,201916,2019-04-26,5,1433.05,1493.9,23703,50389,-16.4,55.4,-1.1,3.87,1433.05,1477.5,44.45,3.1,3,17032,2.96,1,23219,1.363257
4,500002,201917,2019-05-03,3,1473.4,1490.55,13102,27577,-4.1,16.65,-0.28,1.13,1473.4,1490.55,17.15,1.16,2,5865,4.7,0,4414,0.752536


In [112]:
if set(cols) == set(cols2):
    print('Same')

Same


In [110]:
print(cols)
print(cols2)

['CPveoHVD', 'HSDG', 'HSDGp', 'HSDL', 'HSDLp', 'HVdAV', 'TI', 'closeH', 'closeL', 'daysTraded', 'first', 'last', 'lastDVdAV', 'lastDVotWk', 'lastTrdDoW', 'volAvg', 'volAvgWOhv', 'volHigh', 'wChng', 'wChngp', 'yearWeek']
['sc_code', 'yearWeek', 'closeL', 'closeH', 'volAvg', 'volHigh', 'daysTraded', 'HSDL', 'HSDG', 'HSDLp', 'HSDGp', 'first', 'last', 'wChng', 'wChngp', 'lastTrdDoW', 'TI', 'volAvgWOhv', 'HVdAV', 'CPveoHVD', 'lastDVotWk', 'lastDVdAV']


#### Quality & Tidiness

##### Quality
No quality issues found in original dataset


##### Tidiness

1. Ignoring not used columns no_trades, net_turnover, tdcloindi, isin
1. Rename column NO_OF_SHRS to volume
1. Create a new dataframe and add below new columns, 
    + closeH - high close in the week
    + closeL - low closein the week
    + volHigh - Highest volume in the week
    + volAvg - Volume average
    + daysTraded - Number of days traded in the week
    + HSDL - Highest Single Day Loss
    + HSDG - Highest Single Day Gain
    + HSDLp - Highest Single Day Loss percent
    + HSDGp - Highest Single Day Gain percent
    + first - First close of the week
    + last - Last close of he week
    + wChng - Week change
    + wChngp - Week change percent
    + lastTrdDoW - Last traded day of week
    + TI - Times increased
    + volAvgWOhv - Volume average without high volume
    + HVdAV - High volume / Average volume(without highvolume)
    + CPveoHVD - Close positive on high volume day
    + lastDVotWk - Last day volume
    + lastDVdAV - Last day volume / average volume


#### Cleaning

##### Functions

In [88]:
# Mostly numpy returning data in numpy
def weekly_trend_analysis_np(exchange, np_weekly_all, df_daily):

    if exchange == 'BSE':
        #ticker = df_daily.at[0,'sc_code']
        #ticker = df_daily.head(1)['sc_code'].values[0]
        ticker = df_daily['sc_code'].to_numpy()[0]
    else:
        #ticker = df_daily.at[0,'symbol']
        #ticker = df_daily.head(1)['symbol'].values[0]
        ticker = df_daily['symbol'].to_numpy()[0]

    arr_yearWeek = df_daily['yearWeek'].to_numpy()
    arr_close = df_daily['close'].to_numpy()
    arr_prevclose = df_daily['prevclose'].to_numpy()
    arr_chng = df_daily['chng'].to_numpy()
    arr_chngp = df_daily['chngp'].to_numpy()
    arr_ts = df_daily['ts'].to_numpy()
    arr_volumes = df_daily['volumes'].to_numpy()

    # Close
    arr_concat = np.column_stack((arr_yearWeek, arr_close))
    npi_gb = npi.group_by(arr_concat[:, 0]).split(arr_concat[:, 1])

    #a = df_temp[['yearWeek', 'close']].to_numpy()
    yearWeek, daysTraded = np.unique(arr_concat[:,0], return_counts=True)
    
    cmaxs, cmins = [], []
    first, last, wChng, wChngp = [], [], [], []
    for idx,subarr in enumerate(npi_gb):
        cmaxs.append( np.amax(subarr) )
        cmins.append( np.amin(subarr) )
        first.append(subarr[0])
        last.append(subarr[-1])
        wChng.append( subarr[-1] - subarr[0] )
        wChngp.append( ( (subarr[-1] / subarr[0]) * 100) - 100 )

    #npi_gb.clear()
    arr_concat = np.empty((100,100))

    # Chng
    arr_concat = np.column_stack((arr_yearWeek, arr_chng))
    npi_gb = npi.group_by(arr_concat[:, 0]).split(arr_concat[:, 1])

    HSDL, HSDG = [], []
    for idx,subarr in enumerate(npi_gb):
        HSDL.append( np.amin(subarr) )
        HSDG.append( np.amax(subarr) )

    #npi_gb.clear()
    arr_concat = np.empty((100,100))

    # Chngp
    arr_concat = np.column_stack((arr_yearWeek, arr_chngp))
    npi_gb = npi.group_by(arr_concat[:, 0]).split(arr_concat[:, 1])

    HSDLp, HSDGp = [], []
    for idx,subarr in enumerate(npi_gb):
        HSDLp.append( np.amin(subarr) )
        HSDGp.append( np.amax(subarr) )

    #npi_gb.clear()
    arr_concat = np.empty((100,100))

    # Last Traded Date of the Week
    arr_concat = np.column_stack((arr_yearWeek, arr_ts))
    npi_gb = npi.group_by(arr_concat[:, 0]).split(arr_concat[:, 1])

    lastTrdDoW = []
    for idx,subarr in enumerate(npi_gb):
        lastTrdDoW.append( subarr[-1] )
    
    #npi_gb.clear()
    arr_concat = np.empty((100,100))

    # Times inreased
    TI = np.where(arr_close > arr_prevclose, 1, 0)

    # Below npi_gb_yearWeekTI is used in volumes section
    arr_concat = np.column_stack((arr_yearWeek, TI))
    npi_gb_yearWeekTI = npi.group_by(arr_concat[:, 0]).split(arr_concat[:, 1])

    tempArr, TI = npi.group_by(arr_yearWeek).sum(TI)

    # Volume ( dependent on above section value t_group , thats the reason to move from top to here)
    arr_concat = np.column_stack((arr_yearWeek, arr_volumes))
    npi_gb = npi.group_by(arr_concat[:, 0]).split(arr_concat[:, 1])

    vmaxs, vavgs, volAvgWOhv, HVdAV, CPveoHVD, lastDVotWk, lastDVdAV = [], [], [], [], [], [], []
    for idx,subarr in enumerate(npi_gb):
        vavgs.append( np.mean(subarr) )
        ldvotWk = subarr[-1]
        lastDVotWk.append(ldvotWk)

        #print(idx, 'O - ',subarr, np.argmax(subarr), ', average : ',np.mean(subarr))
        ixDel = np.argmax(subarr)
        hV = subarr[ixDel]
        vmaxs.append( hV )

        if(len(subarr)>1):
            subarr = np.delete(subarr, ixDel)
            vawoHV = np.mean(subarr)
        else:
            vawoHV = np.mean(subarr)
        volAvgWOhv.append( vawoHV )
        HVdAV.append(hV / vawoHV)
        CPveoHVD.append( npi_gb_yearWeekTI[idx][ixDel] )
        lastDVdAV.append( np.round(ldvotWk / vawoHV, 2) )    

    #npi_gb.clear()
    arr_concat = np.empty((100,100))

    # Preparing the dataframe
    # yearWeek and occurances 
    #yearWeek, daysTraded = np.unique(a[:,0], return_counts=True)
    yearWeek = yearWeek.astype(int)
    HSDL = np.round(HSDL,2)
    HSDG = np.round(HSDG,2)
    HSDLp = np.round(HSDLp,2)
    HSDGp = np.round(HSDGp,2)

    first = np.round(first,2)
    last = np.round(last,2)
    wChng = np.round(wChng,2)
    wChngp = np.round(wChngp,2)

    vavgs = np.array(vavgs).astype(int)
    volAvgWOhv = np.array(volAvgWOhv).astype(int)
    HVdAV = np.round(HVdAV,2)

    ticker = np.full(yearWeek.shape[0], ticker)
    np_weekly = np.column_stack((ticker, yearWeek, lastTrdDoW, daysTraded, cmins, cmaxs, vavgs, vmaxs, HSDL
                               , HSDG, HSDLp, HSDGp, first, last, wChng, wChngp, TI, volAvgWOhv, HVdAV
                               , CPveoHVD, lastDVotWk, lastDVdAV))
    
    # Removing latest yearWeek from df_weekly_all as it could be partial and concatenating latest one(df_weekly) to df_weekly_all
    if len(np_weekly_all) > 0:
        #print(len(np_weekly_all))
        a = np_weekly_all[:,1] 
        b = np_weekly[:,1] 
        tf_1 = np.isin(a, b, invert=True) 
        #print(tf_1)
        t_result = list(compress(range(len(tf_1)), tf_1)) 
        #print(t_result)
        np_weekly_all = np_weekly_all[t_result]
        np_weekly_all = np.vstack((np_weekly_all, np_weekly))    
    else:
        np_weekly_all = []
        np_weekly_all = np.vstack((np_weekly))    
        
    return np_weekly_all


#### Stage 1 - Single Stock Test

In [62]:
df_bse_daily.columns

Index(['ts', 'sc_code', 'sc_name', 'sc_group', 'sc_type', 'open', 'high',
       'low', 'close', 'last', 'prevclose', 'no_trades', 'no_of_shrs',
       'net_turnover', 'tdcloindi', 'isin'],
      dtype='object')

In [13]:
print(df_bse_daily.shape)
print(df_bse_daily.iloc[-1]['ts'])

(657058, 16)
2020-03-27 00:00:00


In [63]:
#df_weekly = df_weekly[0:0]
df_weekly = pd.DataFrame() 
np_weekly_all = []
np_weekly = []

In [64]:
df_weekly.columns

Index([], dtype='object')

In [87]:
df_bse_bk = df_bse_daily.copy()

In [216]:
df_bse_daily = df_bse_bk.copy()

In [90]:
%%time
# Numpy
# weekly_trend_analysis = pandas = Wall time: 481 ms
# weekly_trend_analysis = numpy partial = Wall time: 179 ms
# weekly_trend_analysis = numpy partial = Wall time: 64 ms
exchange = 'BSE'
bse_csv_cols = ['ts', 'sc_code', 'sc_name', 'high', 'low', 'close', 'prevclose', 'no_of_shrs']
sc_code = 500002 #ABB
sc_code = 532370 #Ramco Systems
#df_weekly_all = pd.DataFrame() 

df_daily = df_bse_daily[df_bse_daily.sc_code == sc_code][bse_csv_cols].copy()
#df_daily = df_bse_5daily[df_bse_5daily.sc_code == sc_code][bse_csv_cols].head(2)

df_daily.rename(columns={"no_of_shrs": "volumes"}, inplace=True)

# yearWeek
df_daily['yearWeek'] = df_daily['ts'].dt.year * 100 + df_daily['ts'].dt.strftime('%U').astype(int)

# Change ts format from datetime64[ns] to object and truncating the time values
# in a way avoid error in weekly_trend_analysis() in last traded date section of code
df_daily['ts'] = df_daily['ts'] .dt.strftime('%Y-%m-%d')

# Change & change percent
df_daily['chng'] = (df_daily['close'] - df_daily['prevclose'])
df_daily['chngp'] = np.round( ( (df_daily['close'] / df_daily['prevclose']) * 100 ) - 100, 2)

# FTA = First Time Analysis
#df_weekly2 = weekly_trend_analysis('sc_code', df_weekly, df_daily)
#df_weekly2 = weekly_trend_analysis(df_weekly, df_daily)
np_weekly = weekly_trend_analysis_np(exchange, np_weekly, df_daily)
print('Length of weekly = ',len(np_weekly))

cols = ['sc_code', 'yearWeek', 'lastTrdDoW', 'daysTraded', 'closeL', 'closeH', 'volAvg', 'volHigh'
         , 'HSDL', 'HSDG', 'HSDLp', 'HSDGp', 'first', 'last', 'wChng', 'wChngp', 'TI', 'volAvgWOhv', 'HVdAV'
         , 'CPveoHVD', 'lastDVotWk', 'lastDVdAV']

df_weekly = pd.DataFrame(data=np_weekly, columns=cols)

Length of weekly =  53
Wall time: 59 ms


In [208]:
df_weekly.head()

Unnamed: 0,sc_code,yearWeek,lastTrdDoW,daysTraded,closeL,closeH,volAvg,volHigh,HSDL,HSDG,HSDLp,HSDGp,first,last,wChng,wChngp,TI,volAvgWOhv,HVdAV,CPveoHVD,lastDVotWk,lastDVdAV
0,532370,201913,2019-04-05,5,240.05,253.6,3924,10344,-8.0,14.1,-3.15,5.89,253.6,240.05,-13.55,-5.34,1,2319,4.46,0,10344,4.46
1,532370,201914,2019-04-12,5,238.7,244.45,1737,2601,-5.5,4.4,-2.25,1.83,244.45,240.55,-3.9,-1.6,3,1521,1.71,0,527,0.35
2,532370,201915,2019-04-18,3,232.15,238.85,87011,258508,-6.6,-0.1,-2.76,-0.04,238.85,232.15,-6.7,-2.81,0,1263,204.68,0,1586,1.26
3,532370,201916,2019-04-26,5,226.0,230.05,1303,2569,-2.75,3.0,-1.2,1.33,230.05,226.25,-3.8,-1.65,1,987,2.6,0,538,0.54
4,532370,201917,2019-05-03,3,213.05,219.65,1288,1633,-6.6,5.35,-3.0,2.51,219.65,218.4,-1.25,-0.57,1,1116,1.46,1,1633,1.46


### Step 3 : Define the data model

In [91]:
df_weekly.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 53 entries, 0 to 52
Data columns (total 22 columns):
sc_code       53 non-null object
yearWeek      53 non-null object
lastTrdDoW    53 non-null object
daysTraded    53 non-null object
closeL        53 non-null object
closeH        53 non-null object
volAvg        53 non-null object
volHigh       53 non-null object
HSDL          53 non-null object
HSDG          53 non-null object
HSDLp         53 non-null object
HSDGp         53 non-null object
first         53 non-null object
last          53 non-null object
wChng         53 non-null object
wChngp        53 non-null object
TI            53 non-null object
volAvgWOhv    53 non-null object
HVdAV         53 non-null object
CPveoHVD      53 non-null object
lastDVotWk    53 non-null object
lastDVdAV     53 non-null object
dtypes: object(22)
memory usage: 4.6+ KB


### Step 4 : Run ETL

In [107]:
df_bse_daily.head()

Unnamed: 0,ts,sc_code,sc_name,sc_group,sc_type,open,high,low,close,last,prevclose,no_trades,no_of_shrs,net_turnover,tdcloindi,isin
0,2019-04-01,500002,ABB LTD.,A,Q,1319.85,1326.6,1306.35,1313.55,1310.05,1319.85,548,7785,10248104,\N,INE117A01022
1,2019-04-01,500003,AEGIS LOGIS,A,Q,204.65,208.45,204.0,206.65,206.15,202.95,178,6150,1266611,\N,INE208C01025
2,2019-04-01,500008,AMAR RAJA BA,A,Q,717.05,732.15,717.05,723.05,723.05,718.95,550,21641,15708848,\N,INE885A01032
3,2019-04-01,500009,A.SARABHAI,X,Q,14.5,14.87,13.41,13.53,13.53,14.29,153,46296,645095,\N,INE432A01017
4,2019-04-01,500010,HDFC,A,Q,1979.0,1979.0,1937.65,1955.25,1943.4,1967.3,2736,707019,1390990348,\N,INE001A01036


#### Stage 2 - Loop Test

In [92]:
#df_weekly = df_weekly[0:0]
df_weekly_all = pd.DataFrame() 

In [93]:
df_weekly_all.columns

Index([], dtype='object')

In [94]:
df_bse_daily.columns

Index(['ts', 'sc_code', 'sc_name', 'sc_group', 'sc_type', 'open', 'high',
       'low', 'close', 'last', 'prevclose', 'no_trades', 'no_of_shrs',
       'net_turnover', 'tdcloindi', 'isin'],
      dtype='object')

In [35]:
print ('Current date and time : {}'.format(dt.datetime.now().strftime("%Y-%m-%d %H:%M:%S")))

Current date and time : 2020-04-07 22:15:26


**Write to multiple csv datasets due to memory error failure**

In [217]:
%%time
# Numpy
# For full BSE run it took Wall time: 1h 26min 31s using pandas
# For full BSE run it took Wall time: 38m 27s using partial numpy

print ('Start date and time : {}'.format(dt.datetime.now().strftime("%Y-%m-%d %H:%M:%S")))

np_weekly_all_new = []
filena = ''
exchange = 'BSE'
#sc_code = 500002 #ABB
#sc_code = 532370 #Ramco Systems
ticker_col = 'sc_code'
bse_csv_cols = ['ts', 'sc_code', 'sc_name', 'high', 'low', 'close', 'prevclose', 'no_of_shrs']
cols = ['sc_code', 'yearWeek', 'lastTrdDoW', 'daysTraded', 'closeL', 'closeH', 'volAvg', 'volHigh'
         , 'HSDL', 'HSDG', 'HSDLp', 'HSDGp', 'first', 'last', 'wChng', 'wChngp', 'TI', 'volAvgWOhv', 'HVdAV'
         , 'CPveoHVD', 'lastDVotWk', 'lastDVdAV']

ticker_list = df_bse_daily['sc_code'].unique()
print('Total sc_code = {} to process'.format(len(ticker_list)))

# Removing unused columns 
df_bse_daily = df_bse_daily[bse_csv_cols].copy()
print ('df_bse_daily : date and time : {}'.format(dt.datetime.now().strftime("%Y-%m-%d %H:%M:%S")))

# changing volume column name
df_bse_daily.rename(columns={"no_of_shrs": "volumes"}, inplace=True)
print ('volume renamed : date and time : {}'.format(dt.datetime.now().strftime("%Y-%m-%d %H:%M:%S")))

# yearWeek
df_bse_daily['yearWeek'] = df_bse_daily['ts'].dt.year * 100 + df_bse_daily['ts'].dt.strftime('%U').astype(int)
print ('added yearWeek : date and time : {}'.format(dt.datetime.now().strftime("%Y-%m-%d %H:%M:%S")))

# Change ts format from datetime64[ns] to object and truncating the time values
# in a way this avoids error in weekly_trend_analysis() in last traded date section of 
# code when two numpy arr are concatenated
df_bse_daily['ts'] = df_bse_daily['ts'].dt.strftime('%Y-%m-%d')
print ('ts truncated : date and time : {}'.format(dt.datetime.now().strftime("%Y-%m-%d %H:%M:%S")))

# Change & change percent
df_bse_daily['chng'] = (df_bse_daily['close'] - df_bse_daily['prevclose'])
df_bse_daily['chngp'] = np.round( ( (df_bse_daily['close'] / df_bse_daily['prevclose']) * 100 ) - 100, 2)
print ('added chng & chngp : date and time : {}'.format(dt.datetime.now().strftime("%Y-%m-%d %H:%M:%S")))

for idx, ticker in enumerate(ticker_list):
    #print(ticker)   
    a, b, tf_1, t_result, np_weekly, np_temp = [], [], [], [], [], []
    
    if len(np_weekly_all) > 0:
        a = np_weekly_all[:,0] 
        b = np.array([ticker])
        tf_1 = np.isin(a, b) 
        t_result = list(compress(range(len(tf_1)), tf_1)) 
        np_weekly = np_weekly_all[t_result]
    
    # It will print for every company for first run or new IPO company
    if len(np_weekly) == 0:
        print('{}. Weekly empty for {}'.format(idx, ticker))
    
    df_daily = df_bse_daily[df_bse_daily.sc_code == ticker]
    
    #df_daily = df_bse_5daily[df_bse_5daily.sc_code == sc_code][['ts', 'sc_code', 'sc_name', 'high', 'low', 'close', 'prevclose', 'no_of_shrs']]    
    
    np_temp = weekly_trend_analysis_np(exchange, np_weekly, df_daily)
    
    # Only incase if df_weekly is empty. Happen during FTA(First Time Analysis)
    if len(np_weekly_all_new) == 0:
        #df_weekly_all = pd.DataFrame(columns=list(df_temp.columns))       
        np_weekly_all_new = np.vstack((np_temp))
    else:
        np_weekly_all_new = np.vstack((np_weekly_all_new, np_temp))
        #a = np_weekly_all[:,0] 
        #b = np.array([ticker])
        #tf_1 = np.isin(a, b, invert=True) 
        #t_result = list(compress(range(len(tf_1)), tf_1)) 
        #np_weekly_all = np_weekly_all[t_result]        
        #np_weekly_all = np.vstack((np_weekly_all, np_temp))            

    if (idx%500 == 0) and (idx > 0):
        filena = 'df_bse_weekly_all-C{}.csv'.format(idx)
        print('Saving to file ',filena)
        df_weekly_all = pd.DataFrame(data=np_weekly_all_new, columns=cols)
        # Write BSE Weekly ALL
        df_weekly_all.to_csv(os.path.join(os.getcwd(), '..', '5. BTD', 'data', filena)
                     , encoding='utf-8', index=False)        
        df_weekly_all = None
        np_weekly_all_new = []
        
print('Length of weekly = ',len(np_weekly_all_new))
df_weekly_all = pd.DataFrame(data=np_weekly_all_new, columns=cols)
filena = 'df_bse_weekly_all-C{}.csv'.format(idx)
df_weekly_all.to_csv(os.path.join(os.getcwd(), '..', '5. BTD', 'data', filena)
                     , encoding='utf-8', index=False)        


Start date and time : 2020-04-08 00:48:29
Total sc_code = 4298 to process
df_bse_daily : date and time : 2020-04-08 00:48:29
volume renamed : date and time : 2020-04-08 00:48:29
added yearWeek : date and time : 2020-04-08 00:48:42
ts truncated : date and time : 2020-04-08 00:48:55
added chng & chngp : date and time : 2020-04-08 00:48:55
0. Weekly empty for 500002
Saving to file  df_bse_weekly_all-C0.csv
1. Weekly empty for 500003
2. Weekly empty for 500008
3. Weekly empty for 500009
4. Weekly empty for 500010
5. Weekly empty for 500012
6. Weekly empty for 500013
7. Weekly empty for 500014
8. Weekly empty for 500016
9. Weekly empty for 500020
10. Weekly empty for 500023
11. Weekly empty for 500024
12. Weekly empty for 500027
13. Weekly empty for 500028
14. Weekly empty for 500029
15. Weekly empty for 500031
16. Weekly empty for 500032
17. Weekly empty for 500033
18. Weekly empty for 500034
19. Weekly empty for 500038
20. Weekly empty for 500039
21. Weekly empty for 500040
22. Weekly emp

274. Weekly empty for 500878
275. Weekly empty for 500890
276. Weekly empty for 500940
277. Weekly empty for 501148
278. Weekly empty for 501150
279. Weekly empty for 501179
280. Weekly empty for 501242
281. Weekly empty for 501298
282. Weekly empty for 501301
283. Weekly empty for 501314
284. Weekly empty for 501343
285. Weekly empty for 501370
286. Weekly empty for 501421
287. Weekly empty for 501423
288. Weekly empty for 501425
289. Weekly empty for 501430
290. Weekly empty for 501455
291. Weekly empty for 501700
292. Weekly empty for 501831
293. Weekly empty for 501833
294. Weekly empty for 501848
295. Weekly empty for 502015
296. Weekly empty for 502090
297. Weekly empty for 502137
298. Weekly empty for 502157
299. Weekly empty for 502168
300. Weekly empty for 502175
301. Weekly empty for 502180
302. Weekly empty for 502219
303. Weekly empty for 502281
304. Weekly empty for 502330
305. Weekly empty for 502355
306. Weekly empty for 502407
307. Weekly empty for 502420
308. Weekly em

559. Weekly empty for 509438
560. Weekly empty for 509472
561. Weekly empty for 509480
562. Weekly empty for 509486
563. Weekly empty for 509488
564. Weekly empty for 509496
565. Weekly empty for 509525
566. Weekly empty for 509557
567. Weekly empty for 509567
568. Weekly empty for 509597
569. Weekly empty for 509631
570. Weekly empty for 509635
571. Weekly empty for 509675
572. Weekly empty for 509692
573. Weekly empty for 509709
574. Weekly empty for 509715
575. Weekly empty for 509820
576. Weekly empty for 509835
577. Weekly empty for 509874
578. Weekly empty for 509895
579. Weekly empty for 509930
580. Weekly empty for 509966
581. Weekly empty for 510245
582. Weekly empty for 511012
583. Weekly empty for 511034
584. Weekly empty for 511064
585. Weekly empty for 511066
586. Weekly empty for 511072
587. Weekly empty for 511076
588. Weekly empty for 511092
589. Weekly empty for 511110
590. Weekly empty for 511116
591. Weekly empty for 511131
592. Weekly empty for 511144
593. Weekly em

843. Weekly empty for 517417
844. Weekly empty for 517421
845. Weekly empty for 517429
846. Weekly empty for 517437
847. Weekly empty for 517447
848. Weekly empty for 517449
849. Weekly empty for 517467
850. Weekly empty for 517477
851. Weekly empty for 517494
852. Weekly empty for 517498
853. Weekly empty for 517500
854. Weekly empty for 517506
855. Weekly empty for 517514
856. Weekly empty for 517518
857. Weekly empty for 517522
858. Weekly empty for 517530
859. Weekly empty for 517536
860. Weekly empty for 517544
861. Weekly empty for 517546
862. Weekly empty for 517548
863. Weekly empty for 517554
864. Weekly empty for 517556
865. Weekly empty for 517562
866. Weekly empty for 517569
867. Weekly empty for 518011
868. Weekly empty for 518029
869. Weekly empty for 518091
870. Weekly empty for 519003
871. Weekly empty for 519031
872. Weekly empty for 519091
873. Weekly empty for 519097
874. Weekly empty for 519105
875. Weekly empty for 519126
876. Weekly empty for 519136
877. Weekly em

1123. Weekly empty for 524408
1124. Weekly empty for 524412
1125. Weekly empty for 524440
1126. Weekly empty for 524444
1127. Weekly empty for 524458
1128. Weekly empty for 524470
1129. Weekly empty for 524480
1130. Weekly empty for 524488
1131. Weekly empty for 524494
1132. Weekly empty for 524500
1133. Weekly empty for 524506
1134. Weekly empty for 524518
1135. Weekly empty for 524520
1136. Weekly empty for 524522
1137. Weekly empty for 524534
1138. Weekly empty for 524542
1139. Weekly empty for 524558
1140. Weekly empty for 524570
1141. Weekly empty for 524576
1142. Weekly empty for 524582
1143. Weekly empty for 524594
1144. Weekly empty for 524598
1145. Weekly empty for 524606
1146. Weekly empty for 524634
1147. Weekly empty for 524640
1148. Weekly empty for 524648
1149. Weekly empty for 524652
1150. Weekly empty for 524661
1151. Weekly empty for 524663
1152. Weekly empty for 524667
1153. Weekly empty for 524669
1154. Weekly empty for 524675
1155. Weekly empty for 524687
1156. Week

1397. Weekly empty for 530871
1398. Weekly empty for 530879
1399. Weekly empty for 530883
1400. Weekly empty for 530885
1401. Weekly empty for 530889
1402. Weekly empty for 530897
1403. Weekly empty for 530919
1404. Weekly empty for 530931
1405. Weekly empty for 530943
1406. Weekly empty for 530951
1407. Weekly empty for 530959
1408. Weekly empty for 530961
1409. Weekly empty for 530965
1410. Weekly empty for 530977
1411. Weekly empty for 530979
1412. Weekly empty for 530985
1413. Weekly empty for 530991
1414. Weekly empty for 530999
1415. Weekly empty for 531041
1416. Weekly empty for 531067
1417. Weekly empty for 531082
1418. Weekly empty for 531083
1419. Weekly empty for 531092
1420. Weekly empty for 531102
1421. Weekly empty for 531109
1422. Weekly empty for 531120
1423. Weekly empty for 531129
1424. Weekly empty for 531146
1425. Weekly empty for 531153
1426. Weekly empty for 531158
1427. Weekly empty for 531161
1428. Weekly empty for 531162
1429. Weekly empty for 531163
1430. Week

1669. Weekly empty for 532365
1670. Weekly empty for 532366
1671. Weekly empty for 532368
1672. Weekly empty for 532369
1673. Weekly empty for 532370
1674. Weekly empty for 532371
1675. Weekly empty for 532372
1676. Weekly empty for 532373
1677. Weekly empty for 532374
1678. Weekly empty for 532375
1679. Weekly empty for 532376
1680. Weekly empty for 532378
1681. Weekly empty for 532379
1682. Weekly empty for 532380
1683. Weekly empty for 532382
1684. Weekly empty for 532384
1685. Weekly empty for 532386
1686. Weekly empty for 532387
1687. Weekly empty for 532388
1688. Weekly empty for 532390
1689. Weekly empty for 532391
1690. Weekly empty for 532392
1691. Weekly empty for 532395
1692. Weekly empty for 532397
1693. Weekly empty for 532400
1694. Weekly empty for 532402
1695. Weekly empty for 532404
1696. Weekly empty for 532406
1697. Weekly empty for 532407
1698. Weekly empty for 532408
1699. Weekly empty for 532410
1700. Weekly empty for 532411
1701. Weekly empty for 532413
1702. Week

1943. Weekly empty for 532869
1944. Weekly empty for 532871
1945. Weekly empty for 532872
1946. Weekly empty for 532873
1947. Weekly empty for 532874
1948. Weekly empty for 532875
1949. Weekly empty for 532878
1950. Weekly empty for 532879
1951. Weekly empty for 532880
1952. Weekly empty for 532883
1953. Weekly empty for 532884
1954. Weekly empty for 532885
1955. Weekly empty for 532886
1956. Weekly empty for 532888
1957. Weekly empty for 532889
1958. Weekly empty for 532890
1959. Weekly empty for 532891
1960. Weekly empty for 532892
1961. Weekly empty for 532893
1962. Weekly empty for 532894
1963. Weekly empty for 532895
1964. Weekly empty for 532898
1965. Weekly empty for 532899
1966. Weekly empty for 532900
1967. Weekly empty for 532902
1968. Weekly empty for 532904
1969. Weekly empty for 532906
1970. Weekly empty for 532907
1971. Weekly empty for 532911
1972. Weekly empty for 532914
1973. Weekly empty for 532915
1974. Weekly empty for 532916
1975. Weekly empty for 532918
1976. Week



2214. Weekly empty for 534675
2215. Weekly empty for 534680
2216. Weekly empty for 534690
2217. Weekly empty for 534691
2218. Weekly empty for 534708
2219. Weekly empty for 534731
2220. Weekly empty for 534734
2221. Weekly empty for 534741
2222. Weekly empty for 534742
2223. Weekly empty for 534748
2224. Weekly empty for 534756
2225. Weekly empty for 534758
2226. Weekly empty for 534804
2227. Weekly empty for 534809
2228. Weekly empty for 534816
2229. Weekly empty for 534920
2230. Weekly empty for 534976
2231. Weekly empty for 535141
2232. Weekly empty for 535205
2233. Weekly empty for 535267
2234. Weekly empty for 535276
2235. Weekly empty for 535322
2236. Weekly empty for 535458
2237. Weekly empty for 535467
2238. Weekly empty for 535601
2239. Weekly empty for 535602
2240. Weekly empty for 535620
2241. Weekly empty for 535621
2242. Weekly empty for 535648
2243. Weekly empty for 535658
2244. Weekly empty for 535667
2245. Weekly empty for 535693
2246. Weekly empty for 535694
2247. Week

2488. Weekly empty for 539871
2489. Weekly empty for 539874
2490. Weekly empty for 539876
2491. Weekly empty for 539883
2492. Weekly empty for 539884
2493. Weekly empty for 539889
2494. Weekly empty for 539917
2495. Weekly empty for 539921
2496. Weekly empty for 539938
2497. Weekly empty for 539939
2498. Weekly empty for 539940
2499. Weekly empty for 539944
2500. Weekly empty for 539945
Saving to file  df_bse_weekly_all-C2500.csv
2501. Weekly empty for 539956
2502. Weekly empty for 539957
2503. Weekly empty for 539962
2504. Weekly empty for 539978
2505. Weekly empty for 539979
2506. Weekly empty for 539980
2507. Weekly empty for 539981
2508. Weekly empty for 539982
2509. Weekly empty for 539984
2510. Weekly empty for 539992
2511. Weekly empty for 540005
2512. Weekly empty for 540006
2513. Weekly empty for 540023
2514. Weekly empty for 540024
2515. Weekly empty for 540025
2516. Weekly empty for 540027
2517. Weekly empty for 540047
2518. Weekly empty for 540048
2519. Weekly empty for 540

2760. Weekly empty for 590056
2761. Weekly empty for 590057
2762. Weekly empty for 590062
2763. Weekly empty for 590065
2764. Weekly empty for 590066
2765. Weekly empty for 590068
2766. Weekly empty for 590070
2767. Weekly empty for 590071
2768. Weekly empty for 590072
2769. Weekly empty for 590073
2770. Weekly empty for 590075
2771. Weekly empty for 590078
2772. Weekly empty for 590086
2773. Weekly empty for 590095
2774. Weekly empty for 590096
2775. Weekly empty for 590097
2776. Weekly empty for 590098
2777. Weekly empty for 590099
2778. Weekly empty for 590101
2779. Weekly empty for 590103
2780. Weekly empty for 590104
2781. Weekly empty for 590106
2782. Weekly empty for 590107
2783. Weekly empty for 590108
2784. Weekly empty for 590109
2785. Weekly empty for 590110
2786. Weekly empty for 590113
2787. Weekly empty for 590115
2788. Weekly empty for 590126
2789. Weekly empty for 590134
2790. Weekly empty for 890144
2791. Weekly empty for 500394
2792. Weekly empty for 501391
2793. Week

3036. Weekly empty for 519224
3037. Weekly empty for 519279
3038. Weekly empty for 519367
3039. Weekly empty for 519494
3040. Weekly empty for 519604
3041. Weekly empty for 519612
3042. Weekly empty for 521054
3043. Weekly empty for 521127
3044. Weekly empty for 521220
3045. Weekly empty for 522292
3046. Weekly empty for 523105
3047. Weekly empty for 523116
3048. Weekly empty for 523415
3049. Weekly empty for 523419
3050. Weekly empty for 523519
3051. Weekly empty for 523558
3052. Weekly empty for 523652
3053. Weekly empty for 524632
3054. Weekly empty for 526027
3055. Weekly empty for 526159
3056. Weekly empty for 526193
3057. Weekly empty for 526217
3058. Weekly empty for 526285
3059. Weekly empty for 526622
3060. Weekly empty for 526662
3061. Weekly empty for 526689
3062. Weekly empty for 526711
3063. Weekly empty for 526859
3064. Weekly empty for 526871
3065. Weekly empty for 526965
3066. Weekly empty for 530065
3067. Weekly empty for 530173
3068. Weekly empty for 530245
3069. Week

3310. Weekly empty for 512020
3311. Weekly empty for 512449
3312. Weekly empty for 512591
3313. Weekly empty for 512604
3314. Weekly empty for 513309
3315. Weekly empty for 513418
3316. Weekly empty for 513430
3317. Weekly empty for 513544
3318. Weekly empty for 516110
3319. Weekly empty for 519191
3320. Weekly empty for 519242
3321. Weekly empty for 521068
3322. Weekly empty for 521178
3323. Weekly empty for 523054
3324. Weekly empty for 523373
3325. Weekly empty for 524314
3326. Weekly empty for 526161
3327. Weekly empty for 526237
3328. Weekly empty for 526301
3329. Weekly empty for 526445
3330. Weekly empty for 526967
3331. Weekly empty for 530109
3332. Weekly empty for 530197
3333. Weekly empty for 530581
3334. Weekly empty for 530907
3335. Weekly empty for 531323
3336. Weekly empty for 531825
3337. Weekly empty for 532007
3338. Weekly empty for 532140
3339. Weekly empty for 532172
3340. Weekly empty for 532340
3341. Weekly empty for 532723
3342. Weekly empty for 532877
3343. Week

3856. Weekly empty for 532100
3857. Weekly empty for 537582
3858. Weekly empty for 538557
3859. Weekly empty for 539121
3860. Weekly empty for 500030
3861. Weekly empty for 511000
3862. Weekly empty for 530571
3863. Weekly empty for 530615
3864. Weekly empty for 531221
3865. Weekly empty for 531930
3866. Weekly empty for 539533
3867. Weekly empty for 540579
3868. Weekly empty for 508571
3869. Weekly empty for 513460
3870. Weekly empty for 523164
3871. Weekly empty for 524156
3872. Weekly empty for 535142
3873. Weekly empty for 541285
3874. Weekly empty for 530407
3875. Weekly empty for 531758
3876. Weekly empty for 539401
3877. Weekly empty for 542682
3878. Weekly empty for 504080
3879. Weekly empty for 533204
3880. Weekly empty for 542677
3881. Weekly empty for 530049
3882. Weekly empty for 530669
3883. Weekly empty for 531126
3884. Weekly empty for 542446
3885. Weekly empty for 531033
3886. Weekly empty for 534639
3887. Weekly empty for 538794
3888. Weekly empty for 540001
3889. Week

4130. Weekly empty for 506313
4131. Weekly empty for 523351
4132. Weekly empty for 538926
4133. Weekly empty for 539091
4134. Weekly empty for 532993
4135. Weekly empty for 539116
4136. Weekly empty for 542646
4137. Weekly empty for 530909
4138. Weekly empty for 542857
4139. Weekly empty for 502850
4140. Weekly empty for 531260
4141. Weekly empty for 540140
4142. Weekly empty for 542846
4143. Weekly empty for 542862
4144. Weekly empty for 519439
4145. Weekly empty for 521238
4146. Weekly empty for 539330
4147. Weekly empty for 514238
4148. Weekly empty for 501311
4149. Weekly empty for 511401
4150. Weekly empty for 512600
4151. Weekly empty for 519353
4152. Weekly empty for 530055
4153. Weekly empty for 531314
4154. Weekly empty for 531738
4155. Weekly empty for 506975
4156. Weekly empty for 542847
4157. Weekly empty for 538382
4158. Weekly empty for 500458
4159. Weekly empty for 531029
4160. Weekly empty for 542863
4161. Weekly empty for 542864
4162. Weekly empty for 531049
4163. Week

**Checking memory capacity**

In [214]:
np_weekly_all_new = []

In [139]:
# Encountered error : MemoryError: Unable to allocate array with shape (82912, 22) and data type <U32
a = []
a = np.zeros((82912, 22), dtype='<U32')
#a = np.zeros((50000,22), dtype='<U32')
a.nbytes
#303755101056


233480192

**df column to list**

In [251]:
df_bse_daily['chngp'].head(10).tolist()

[-0.48, 1.82, 0.57, -5.32, -0.61, 2.47, -0.09, 5.0, 4.52, 1.33]

**Generating some test data for S.O**

In [252]:
ts = ['2019-04-01 00:00:00','2019-04-01 00:00:00','2019-04-01 00:00:00','2019-04-01 00:00:00','2019-04-01 00:00:00','2019-04-02 00:00:00','2019-04-02 00:00:00','2019-04-02 00:00:00','2019-04-02 00:00:00','2019-04-02 00:00:00']
sc_code = ['500002','500002','500002','500002','500002','500002','500002','500002','500002','500002']
high = [1326.6, 208.45, 732.15, 14.87, 1979.0, 57.8, 11.55, 1.68, 8.1, 139.4]
low = [1306.35, 204.0, 717.05, 13.41, 1937.65, 54.65, 11.2, 1.52, 7.75, 135.65]
close = [1313.55, 206.65, 723.05, 13.53, 1955.25, 56.0, 11.21, 1.68, 8.1, 136.85]
prevclose = [1319.85, 202.95, 718.95, 14.29, 1967.3, 54.65, 11.22, 1.6, 7.75, 135.05]
volumes = [7785, 6150, 21641, 46296, 707019, 40089, 25300, 5920, 500, 235355]
yearWeek = [201913, 201913, 201913, 201913, 201913, 201913, 201913, 201913, 201913, 201913]
chng = [-6.29, 3.70, 4.09, -0.75, -12.04, 1.35, -0.09, 0.079, 0.34, 1.79]
chngp = [-0.48, 1.82, 0.57, -5.32, -0.61, 2.47, -0.09, 5.0, 4.52, 1.33]


dict_temp = {'ts':ts, 'sc_code':sc_code, 'high':high, 'low':low, 'close':close, 'prevclose':prevclose, 'volumes':volumes, 'yearWeek':yearWeek, 'chng':chng, 'chngp':chngp}
df_weekly = pd.DataFrame(data=dict_temp)


In [253]:
df_weekly.head()

Unnamed: 0,chng,chngp,close,high,low,prevclose,sc_code,ts,volumes,yearWeek
0,-6.29,-0.48,1313.55,1326.6,1306.35,1319.85,500002,2019-04-01 00:00:00,7785,201913
1,3.7,1.82,206.65,208.45,204.0,202.95,500002,2019-04-01 00:00:00,6150,201913
2,4.09,0.57,723.05,732.15,717.05,718.95,500002,2019-04-01 00:00:00,21641,201913
3,-0.75,-5.32,13.53,14.87,13.41,14.29,500002,2019-04-01 00:00:00,46296,201913
4,-12.04,-0.61,1955.25,1979.0,1937.65,1967.3,500002,2019-04-01 00:00:00,707019,201913


**Identifying empty df**

In [149]:
if not df_daily.empty:
    print('DataFrame is empty!')

DataFrame is empty!


**Processing a list**

In [130]:
ticker_list = df_bse_daily['sc_code'].unique()
print(len(ticker_list))
for ticker in ticker_list:
    #print(ticker)
    pass

4298


**Removing common items**

In [95]:
def remove_common_items(list1, list2): 
    """
    Summary line. 
    Removes the items which are in both the lists

    Parameters: 
    arg1 (list 1)
    arg2 (list 2)

    Returns: 
    list with difference
    
    Sample : print(difference(keep_columns, df_jan.columns))
    """                
    list_dif = [i for i in list1 + list2 if i not in list1 or i not in list2]
    return list_dif

In [98]:
list1 = [*range(1, 10, 1)] 
list2 = [*range(5, 15, 1)]
print(list1)
print(list2)
print("Removed common items =",remove_common_items(list1, list2))

# This returns SET
a = set(list1)
b = set(list2)
print("SET Difference = ",list(a.difference(b)))

[1, 2, 3, 4, 5, 6, 7, 8, 9]
[5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
Removed common items = [1, 2, 3, 4, 10, 11, 12, 13, 14]
SET Difference =  [1, 2, 3, 4]


### Step 5 : Write project write-up

Hype is real about numpy, but coding wise its really difficult. 

#### Learnings
* Use .iloc to extract value from specific cell. `last_week = df_weekly2.iloc[-1]['yearWeek'].astype(int)`
* Use .loc when you want to assign a value to a cell
* Performance wise .at & .iat performs better than .iloc . `ticker = df_daily.at[0,'symbol']`
    - [Poor performance for .loc and .iloc compared to .ix #6683](https://github.com/pandas-dev/pandas/issues/6683)
* Emptying a dataframe `df[0:0]`
* Checking if dataframe is empty
```
if df_weekly.empty:
    print('DataFrame is empty!')
```
* FutureWarning in pd.concat, so to avoid that had to add sorting
```
pd.concat([df], sort=True) - Sorts column names
pd.concat([df], sort=False) - Doesn't sort column names
```
* Dataframe to numpy to 2D array
```
df_temp[['yearWeek', 'close']].values
```
#### Resources
* [Python - How to check list monotonicity](https://stackoverflow.com/questions/4983258/python-how-to-check-list-monotonicity)
* [How to delete multiple rows of NumPy array?](https://stackoverflow.com/questions/47622139/how-to-delete-multiple-rows-of-numpy-array)