# データ構造に依存するプログラミング

## ローマ数字を変換する

* [Roman numerals - Wikipedia](https://en.wikipedia.org/wiki/Roman_numerals)

### Wikipediaのデータを活用する

pandasのライブラリ関数 `read_html()` を使ってWikipediaの表を読み込む:

* [Web scraping - Wikipedia](https://en.wikipedia.org/wiki/Web_scraping)
    - [pandas.read_html — pandas 2.0.1 documentation](https://pandas.pydata.org/docs/reference/api/pandas.read_html.html)

In [2]:
import pandas as pd
tables = pd.read_html('https://en.wikipedia.org/wiki/Roman_numerals')

In [6]:
tables[1]

Unnamed: 0,0
0,Hindu-Arabic numerals Western Arabic Eastern A...
1,East Asian systems Contemporary Chinese Suzhou...
2,Other systems History Ancient Babylonian Post-...
3,By radix/base Common radices/bases 2 3 4 5 6 8...


読み込んだデータがそのまま使えないこともある。その場合は修正する:

In [7]:
df1 = tables[2]
df1

Unnamed: 0,0,1,2,3,4,5,6
0,".mw-parser-output .roman-numeral{font-family:""...",V,X,L,C,D,M
1,1,5,10,50,100,500,1000


In [8]:
df1.columns

Index([0, 1, 2, 3, 4, 5, 6], dtype='int64')

In [9]:
df1[0][0]

'.mw-parser-output .roman-numeral{font-family:"Nimbus Roman No9 L","Times New Roman",Times,serif;font-size:118%;line-height:1}.mw-parser-output .roman-numeral-a{border:1px solid}.mw-parser-output .roman-numeral-t{border-top:1px solid}.mw-parser-output .roman-numeral-v{border:solid;border-width:0 1px;padding:0 2px}.mw-parser-output .roman-numeral-h{border:solid;border-width:1px 0}.mw-parser-output .roman-numeral-tv{border:1px solid;border-bottom:none;padding:0 2px}I'

In [6]:
df1[0][0] = 'I'

In [7]:
df1

Unnamed: 0,0,1,2,3,4,5,6
0,I,V,X,L,C,D,M
1,1,5,10,50,100,500,1000


表を参照してローマ数字を変換するプログラムを書く:
* `index` (行番号) を設定し直す
* 表形式のままでもプログラムは書けるが、JSON形式を経由して辞書形式に変換する

In [9]:
df2 = tables[3]
df2

Unnamed: 0.1,Unnamed: 0,Thousands,Hundreds,Tens,Units
0,1,M,C,X,I
1,2,MM,CC,XX,II
2,3,MMM,CCC,XXX,III
3,4,,CD,XL,IV
4,5,,D,L,V
5,6,,DC,LX,VI
6,7,,DCC,LXX,VII
7,8,,DCCC,LXXX,VIII
8,9,,CM,XC,IX


In [10]:
df_new = df2.rename(columns={'Unnamed: 0':'index'}).set_index('index')
df_new

Unnamed: 0_level_0,Thousands,Hundreds,Tens,Units
index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1,M,C,X,I
2,MM,CC,XX,II
3,MMM,CCC,XXX,III
4,,CD,XL,IV
5,,D,L,V
6,,DC,LX,VI
7,,DCC,LXX,VII
8,,DCCC,LXXX,VIII
9,,CM,XC,IX


* [NaN - Wikipedia](https://en.wikipedia.org/wiki/NaN)
    - [Working with missing data — pandas 2.0.1 documentation](https://pandas.pydata.org/docs/user_guide/missing_data.html)

In [14]:
df_new.to_json()

'{"Thousands":{"1":"M","2":"MM","3":"MMM","4":null,"5":null,"6":null,"7":null,"8":null,"9":null},"Hundreds":{"1":"C","2":"CC","3":"CCC","4":"CD","5":"D","6":"DC","7":"DCC","8":"DCCC","9":"CM"},"Tens":{"1":"X","2":"XX","3":"XXX","4":"XL","5":"L","6":"LX","7":"LXX","8":"LXXX","9":"XC"},"Units":{"1":"I","2":"II","3":"III","4":"IV","5":"V","6":"VI","7":"VII","8":"VIII","9":"IX"}}'

In [23]:
import json

In [34]:
df_json = json.loads((df_new.to_json()))
print(json.dumps(df_json, indent=4))

{
    "Thousands": {
        "1": "M",
        "2": "MM",
        "3": "MMM",
        "4": null,
        "5": null,
        "6": null,
        "7": null,
        "8": null,
        "9": null
    },
    "Hundreds": {
        "1": "C",
        "2": "CC",
        "3": "CCC",
        "4": "CD",
        "5": "D",
        "6": "DC",
        "7": "DCC",
        "8": "DCCC",
        "9": "CM"
    },
    "Tens": {
        "1": "X",
        "2": "XX",
        "3": "XXX",
        "4": "XL",
        "5": "L",
        "6": "LX",
        "7": "LXX",
        "8": "LXXX",
        "9": "XC"
    },
    "Units": {
        "1": "I",
        "2": "II",
        "3": "III",
        "4": "IV",
        "5": "V",
        "6": "VI",
        "7": "VII",
        "8": "VIII",
        "9": "IX"
    }
}


In [35]:
%%ai_ask
次の表を10進数をローマ数字に変換するときに使うPythonの辞書に変換してください
```json
{
    "Thousands": {
        "1": "M",
        "2": "MM",
        "3": "MMM",
        "4": null,
        "5": null,
        "6": null,
        "7": null,
        "8": null,
        "9": null
    },
    "Hundreds": {
        "1": "C",
        "2": "CC",
        "3": "CCC",
        "4": "CD",
        "5": "D",
        "6": "DC",
        "7": "DCC",
        "8": "DCCC",
        "9": "CM"
    },
    "Tens": {
        "1": "X",
        "2": "XX",
        "3": "XXX",
        "4": "XL",
        "5": "L",
        "6": "LX",
        "7": "LXX",
        "8": "LXXX",
        "9": "XC"
    },
    "Units": {
        "1": "I",
        "2": "II",
        "3": "III",
        "4": "IV",
        "5": "V",
        "6": "VI",
        "7": "VII",
        "8": "VIII",
        "9": "IX"
    }
}
```

Pythonの辞書を以下のように作成できます。

```python
roman_dict = {
    "Thousands": {
        "1": "M",
        "2": "MM",
        "3": "MMM",
        "4": None,
        "5": None,
        "6": None,
        "7": None,
        "8": None,
        "9": None
    },
    "Hundreds": {
        "1": "C",
        "2": "CC",
        "3": "CCC",
        "4": "CD",
        "5": "D",
        "6": "DC",
        "7": "DCC",
        "8": "DCCC",
        "9": "CM"
    },
    "Tens": {
        "1": "X",
        "2": "XX",
        "3": "XXX",
        "4": "XL",
        "5": "L",
        "6": "LX",
        "7": "LXX",
        "8": "LXXX",
        "9": "XC"
    },
    "Units": {
        "1": "I",
        "2": "II",
        "3": "III",
        "4": "IV",
        "5": "V",
        "6": "VI",
        "7": "VII",
        "8": "VIII",
        "9": "IX"
    }
}
```

この辞書を使って、例えば「3,456」をローマ数字に変換する場合は、以下のようになります。

```python
thousands = roman_dict["Thousands"]["3"]
hundreds = roman_dict["Hundreds"]["4"]
tens = roman_dict["Tens"]["5"]
units = roman_dict["Units"]["6"]
roman_numeral = thousands + hundreds + tens + units
print(roman_numeral) # 結果: "MMMCDLVI"
```