# Capella manuscript analysis

In [1]:
%load_ext autoreload

Load modules with helper functions.

In [2]:
%rehashx
%matplotlib inline
%autoreload 2

import bokeh
from bokeh.charts import Bar, output_notebook, show
from bokeh.layouts import row

import matplotlib.pyplot as plt

from medDiaJson import *
from roman_date import *

In [3]:
output_notebook()

## Build dataframe

First, build full dataframe for all manuscripts of the JSON file. Loading of JSON file is done in module. 

In [4]:
df = medDiaCon(237)

Convert date entries from roman numerals to arabic numerals. Only the first roman numeral is checked for. 
All additional information is neglected by this approach.

In [5]:
df['date'] = df['date'].apply(lambda row: from_roman(row)*100)

Give all available diagram types for Capella.

In [6]:
authorKey(df,'Capella','diaTyp')

[0.0,
 '',
 18.0,
 19.0,
 20.0,
 21.0,
 22.0,
 23.0,
 24.0,
 25.0,
 26.0,
 27.0,
 28.0,
 29.0,
 30.0,
 31.0]

Give all available dates.

In [7]:
authorKey(df,'Capella','date')

[1200, 900, 1500, 1100, 1000]


# Checks for missing data

Check for missing diagram types: Use override=True to disable data typ checking. 

First, check for empty diagram typ string. 

In [8]:
reducedData(df,[['author','Capella'],['diaTyp','']])

Unnamed: 0,M0,altID,author,biblio,date,diaID,diaTyp,diaURL,foliopage,manID,manURL,textID
0,no attributes,Cape147,Capella,"Paris BN, 13955",900,MAPD0343,,,52v,EW(1),,EW(1)_A
1,no attributes,Cape148,Capella,"Milano BA, E.5 sup.",1200,MAPD0344,,,53r,JG(4),Jg_4&pn=1,JG(4)_A
2,no attributes,Cape151,Capella,"Paris BN, nal 340",1000,MAPD0348,,,82v,QK(4),,QK(4)_A
3,no attributes,Cape152,Capella,"Paris BN, nal 340",1000,MAPD0349,,,83r,QK(4),,QK(4)_A
4,no attributes,Cape153,Capella,"Paris BN, nal 340",1000,MAPD0350,,,83r,QK(4),,QK(4)_A


Then, check for 0.0 float entries. 

In [9]:
reducedData(df,[['author','Capella'],['diaTyp', 0.0]])

Unnamed: 0,M0,altID,author,biblio,date,diaID,diaTyp,diaURL,foliopage,manID,manURL,textID
0,no attributes,Cape116,Capella,"Firenze BL, San Marco, 190",1200,MAPD0310,0,Ao_3&pn=103&dw=1858&dh=901&ww=0.058&wh=0.0677&...,102r,AO(3),Ao_3&pn=1,AO(3)_A
1,no attributes,Cape117,Capella,"Firenze BL, Plut, 51.13",1500,MAPD0311,0,Az_2&pn=83&dw=1858&dh=901&ww=0.0369&wh=0.0439&...,128v,AZ(2),Az_2&pn=1,AZ(2)_A
2,no attributes,Cape113,Capella,"Leiden UB, BPL, 36",900,MAPD0307,0,Ba_4&pn=131&dw=1858&dh=901&ww=0.1147&wh=0.1462...,129r,BA(4),Ba_4&pn=1,BA(4)_A
3,no attributes,Cape118,Capella,"Vaticano BAV, Urb. 329",1500,MAPD0312,0,Bj_2&pn=138&dw=1858&dh=901&ww=0.0488&wh=0.0416...,139v,BJ(2),Bj_2&pn=1,BJ(2)_A
4,no attributes,Cape46,Capella,"Leiden UB, Voss. F.48",900,MAPD0410,0,Lh_2&pn=80&dw=1858&dh=901&ww=0.1094&wh=0.1803&...,79v,LH(2),Lh_2&pn=1,LH(2)_A
5,no attributes,Cape59,Capella,"Leiden UB, Voss. F.48",900,MAPD0424,0,Lh_2&pn=81&dw=1858&dh=901&ww=0.0963&wh=0.1598&...,81r,LH(2),Lh_2&pn=1,LH(2)_A
6,no attributes,Cape60,Capella,"Besancon BM, 594",900,MAPD0426,0,,72v,QY(2),,QY(2)_A
7,no attributes,Cape47,Capella,"Paris BN, 14754",1200,MAPD0411,0,Zz_2&pn=213&dw=1858&dh=901&ww=0.09&wh=0.1092&w...,188r,Z(2)1,Zz_6&pn=1,Z(2)1_A
8,no attributes,Cape114,Capella,"Paris BN, 8671",900,MAPD0308,0,Zz_5&pn=171&dw=1858&dh=901&ww=0.2301&wh=0.148&...,84r,Z(3)8,Zz_3&pn=1,Z(3)8_A
9,no attributes,Cape48,Capella,"Paris BN, 8669",900,MAPD0412,0,Zz_4&pn=126&dw=1858&dh=901&ww=0.0949&wh=0.1488...,122v,Z(4)1,Zz_4&pn=1,Z(4)1_A


Thus, diagrams without attributes are either marked by 0.0 or an empty string. 

However, filtering for attribute names like M18.1 ... will neglect M0 entries anyway.

# Links to in Topoi-Database

To show the image for a given digramm ID, the id2image looks up the correct URL in the JSON file and displays the digilib image inside an iframe. All digilib tools should work as expected.

In [10]:
id2image(df,'MAPD0420')

Alternatively, using altId2image opens a new tab to edition.topoi

In [13]:
# altId2image(df,'MAPD0420')

In [14]:
#manID2image(df,'Z(3)8')

To include the descriptions of diagram types the following links to the digilib tool. 
For smaller screens, one can use the webbrowser package to open a new tab. 

In [15]:
#import webbrowser
#webbrowser.open('http://www.ancient-astronomy.org/webapplications/domenico/SliderDigilib.html')

For larger screens one can display the content inline.

In [16]:
from IPython.display import HTML
HTML('<iframe src=http://www.ancient-astronomy.org/webapplications/domenico/SliderDigilib.html + width=100% height=450></iframe>')

Can be useful to check validity of attributes etc.

To obtain the description of a diagram type use the following. Requieres more work for pretty printing...

In [17]:
diaTypeDescr(df,1)

Latitudes Rectangular (Höhen-Diagramm (Rechteck))

The rectangular presentation of latitudes occurs against a grid created by the intersections of 13 equally spaced horizontal lines with 31 equally spaced vertical lines -- a grid of 12 x 30 small squares. While this is the most common framework for the latitudes in this form, the grid may be drawn with different numbers of lines, e.g., 12 x 20. The 13 horizontal lines, which provide 12 intervals vertically, are the critical element in the layout. As in the circular diagram each planet oscillates between two extreme limits, but here the limits are upper and lower horizontals, rather than the inner and outer circles of the circular form. In the rectangular grid, the path of each planet has a regular wavelike pattern as it oscillates between its latitudinal limits. The names of the seven planets appear vertically on the left side of the grid, In accord with Pliny's text, the sun follows a serpentine path in the two middle intervals, or de

In [23]:
listofTextIDs = uniqueValues(reducedData(df,[['author','Capella']]),'textID')

In [28]:
reducedData(df,[['author','Capella'],['textID',listofTextIDs[1]]])

Unnamed: 0,M22.1,M22.2,M22.3,M22.4,M22.5,M22.6,altID,author,biblio,date,diaID,diaTyp,diaURL,foliopage,manID,manURL,textID
0,?,?,?,?,?,?,Cape37,Capella,"Leiden UB, BPL, 144",1200,MAPD0400,22,,90r,BJ(4),Bj_4&pn=1,BJ(4)_A


In [26]:
textId2imagegrid(df,'Capella',listofTextIDs[2])

Diagrams in text Z(3)8_A from author Capella in manuscript Z(3)8,Unnamed: 1,Unnamed: 2
,,
"MAPD0294, Dia. type: 27.0","MAPD0305, Dia. type: 28.0","MAPD0308, Dia. type: 0.0"
,,
"MAPD0325, Dia. type: 29.0","MAPD0339, Dia. type: 30.0","MAPD0346, Dia. type: 19.0"
,,
"MAPD0364, Dia. type: 18.0","MAPD0422, Dia. type: 23.0","MAPD0438, Dia. type: 24.0"
,,
"MAPD0449, Dia. type: 25.0","MAPD0460, Dia. type: 26.0","MAPD0736, Dia. type: 31.0"


# Plotting diagram attributes and types

Next, replace missing diagram attributes which are encoded by '?'. I removed the diagrams without typ, i.e. 0.0 and ''  from the list of diagram types.

In [29]:
dftempList = []
typList = [x for x in authorKey(df,'Capella','diaTyp') if x not in ('',np.float64(0))]
for typ in typList:
    dftemp = reducedData(df,[['author','Capella'],['diaTyp',typ]])
    dftempList.append(dftemp)
dfCapella = pd.concat(dftempList).reset_index(drop=True).replace(['?'],[None])

Now we can count the diagram typs per year. The bokeh plot should be sorted by date.

In [30]:
counttempList = []

typList = [x for x in authorKey(df,'Capella','diaTyp') if x not in ('',np.float64(0))]

for year in [1200, 900, 1500, 1100, 1000]: 
    for typ in typList:
        cnt = dfCapella[(dfCapella['diaTyp']==typ) & (dfCapella['date']==year)].diaTyp.count()
        counttempList.append((typ,year,cnt))
dfCapellaCount = pd.DataFrame(counttempList)
dfCapellaCount.columns = ['diaTyp','date','count']
dfCapellaCount.sort_values(by='date',inplace=True)

In [31]:
p0 = Bar(dfCapellaCount,label='diaTyp',values='count',group='date')

show(p0)

It seems there are no entries for date 1000. check by 

In [32]:
dfCapella1000 = reducedData(df,[['author','Capella'],['date',np.int64(1000)]])
dfCapella1000

Unnamed: 0,M0,altID,author,biblio,date,diaID,diaTyp,diaURL,foliopage,manID,manURL,textID
0,no attributes,Cape151,Capella,"Paris BN, nal 340",1000,MAPD0348,,,82v,QK(4),,QK(4)_A
1,no attributes,Cape152,Capella,"Paris BN, nal 340",1000,MAPD0349,,,83r,QK(4),,QK(4)_A
2,no attributes,Cape153,Capella,"Paris BN, nal 340",1000,MAPD0350,,,83r,QK(4),,QK(4)_A


## Interactive selction of plot features

Using jupyter widget interact allows to define selectors for plotting.

### Number of attributes per diagram typ grouped by date

In [33]:
from ipywidgets import interact

from bokeh.io import push_notebook

def attrPlot(typ=18):
    dfbokeh = diaAttrPlot(df,'Capella',np.float64(typ))
    p1 = Bar(dfbokeh,values='attribute',group='date')
    show(p1)

In [34]:
typList = [x for x in authorKey(df,'Capella','diaTyp') if x not in ('', 0)]
interact(attrPlot,typ=typList)

  if '?' in dftemp2.values:


<function attrPlot at 0x7fa488f17d08>

### Number of diagrams per date for each diagram typ

In [35]:
from ipywidgets import interact

from bokeh.io import push_notebook

def diagIdUpdate(ide):
    dfp2 = dfCapellaCount[dfCapellaCount.diaTyp==ide]
    p2 = Bar(dfp2,label='diaTyp',values='count',group='date')
    show(p2)

In [36]:
ideList= list()
interact(diagIdUpdate,ide = (18,31))

<function diagIdUpdate at 0x7fa488fe7598>

## Occurance of diagram typs in each manuscript

First create list of DataFrames with requiered information.
Diagram typs range from 18 to 31 for the author Capella. 
Thus, a DataFrame with this range is created. 

Then, a dictionary is created which encodes the occurance of a diagram typ in the manuscript and applied to the column 'Count'. 
Finally, columns with the information of origin and date are added. 

In [37]:
biblioList = []

for biblio in [x for x in authorKey(df,'Capella','biblio')]:
    # Create reduced dataframe
    resTemp = reducedData(df,[['author','Capella'],['biblio',biblio]])
    # Drop all columns appart from date, biblio and diaTyp
    temp = resTemp.drop([x for x in resTemp.columns if x not in ('date','biblio','diaTyp')],axis=1)
    # create mapping for diagram types which are present in this dataframe diaTyp : 1
    d1 = {int(x):1 for x in list(temp['diaTyp'].values) if x not in ['']}
    # and for those not present diaTyp : 0
    s1 = set(x for x in temp['diaTyp'].values if x not in [''])
    s2 = set(range(18,32))
    d2 = {int(x):0 for x in list(s1 ^ s2)}
    # combine the two dicts
    d0 = d1.copy()
    d0.update(d2)
    #Works in python 3.5
    #d0 = {**d1, **d2}
    # Create new dataframe with all possible diagram types for Capella
    dfTEMP = pd.DataFrame(list(zip(list(range(18,32)),[0]*14)),index=range(14),columns=['diaTyp','Count'])
    # apply the mapping
    dfTEMP['Count'] = dfTEMP['diaTyp'].map(d0)
    # copy information for biblio and date
    dfTEMP['biblio'] = biblio
    dfTEMP['date'] = temp['date'][0]
    res = dfTEMP.sort_values(by='diaTyp',inplace=True)
    # append to list of dataframes
    biblioList.append(dfTEMP)

To give interact a list of origin names, we need a function, which operates on this names. 
For this purpose we can use list comprehension with string comparision. See line data = ...

In [38]:
from ipywidgets import interact

from bokeh.io import push_notebook, gridplot
from bokeh.plotting import figure
from bokeh.charts import Bar
from bokeh.models import FixedTicker, Legend
from bokeh.palettes import viridis

def biblioDiaTyp(biblio):
    # Select DataFrame from list by matching strings. 
    data = [biblioList[s] for s in range(len(biblioList)) if biblioList[s]['biblio'][0] in biblio][0]
    # Set title of plot for better info
    titleS = 'Origin: ' + data['biblio'][0] + '; Date: ' + str(data['date'][0]) + ' CE'
    # use palette=viridis(14) to get different color for every diagram typ bar
    b1 = Bar(data,title=titleS,label='biblio',
             values='Count',group='diaTyp',bar_width=1,ylabel='Diagrams',palette=viridis(14))
    b1.xaxis.major_label_orientation = "horizontal"
    b1.xaxis.axis_label=''
    b1.legend.location = "right_center"
    b1.legend.background_fill_alpha=0.5
    show(b1)

In [39]:
biblioKeys = sorted([x for x in authorKey(df,'Capella','biblio')])

In [40]:
interact(biblioDiaTyp,biblio=biblioKeys)

<function biblioDiaTyp at 0x7fa488f8c158>

### Compare all manuscripts with same date

Takes some seconds to build all plots. Plots are sorted by number of occuring diagram types. Manuscripts with most diagram types come first. 

In [41]:
def plotDateGrid(date):
    # Assert given date is available. 
    assert date in authorKey(df,'Capella','date'), 'No entries for this date.' 
    # Create list of fitting dataframes
    tempList = [biblioList[s] for s in range(len(biblioList)) if biblioList[s]['date'][0] in [date]]
    # sort by occuring diagrams
    dfDATE = sorted(tempList,key=lambda tempList: tempList['Count'].sum(),reverse=True)
    plotListDATE = []  
    for x in range(len(dfDATE)):
        titleS = dfDATE[x]['biblio'][0] + '; ' + str(dfDATE[x]['date'][0]) + ' CE'
        b0 = Bar(dfDATE[x],title=titleS,label='biblio',values='Count',group='diaTyp',
             bar_width=1,ylabel='Diagrams',palette=viridis(14),width=250,height=250,
                legend=False)
        b0.xaxis.major_label_orientation = "horizontal"
        b0.xaxis.axis_label=''
        plotListDATE.append(b0)
    plotGrid = gridplot(plotListDATE,ncols=3)
    show(plotGrid)

In [42]:
plotDateGrid(900)

Next step is to compare diagram attributes by biblio index and diagram type.