In this notebook you will get information about **Lemuras** data types and how to read and write **Table** objects in and from **CSV, SQL, JSON** and **HTML** formats.

### Sample data

In [1]:
from lemuras import Table
from datetime import datetime, timedelta

def mkdt(ds):
    return datetime.now() - timedelta(days=ds)

cols = ['type', 'size', 'weight', 'when', 'tel']
rows = [
  ['A', 1, 12, mkdt(3), '+79360360193'],
  ['B', 4, 12, mkdt(33), 84505505151],
  ['A', 3, 10, mkdt(48), '+31415926535'],
  ['B', 6, 14, mkdt(333), 0],
  ['A', 4, 15, mkdt(209), None],
  ['A', 2, 11, mkdt(192), ''],
]
df1 = Table(cols, rows, 'Sample')
df1

'type','size','weight','when','tel'
'A',1,12,2018-11-29 13:40:08.116768,'+79360360193'
'B',4,12,2018-10-30 13:40:08.116775,84505505151
'A',3,10,2018-10-15 13:40:08.116776,'+31415926535'
'B',6,14,2018-01-03 13:40:08.116777,0
'A',4,15,2018-05-07 13:40:08.116778,
'A',2,11,2018-05-24 13:40:08.116779,''


# Data types

Lemuras Table object consists of native Python lists, so, it can contain any objects that support **str** and **repr**. However, there is advanced built-in support for handling such types:

- **int** – identifier **`i`**.
- **float** – identifier **`f`**.
- **str** – identifier **`s`**.
- **date** – identifier **`d`**.
- **datetime** – identifier **`t`**.

Also, you can meet the identifier **`m`** which means that a column has multiple or mixed types.

The **`.get_type()`** method of **Column** object return tuple with the type identifier and maximum needed symbols length:

In [2]:
df1['weight'].get_type()

('i', 2)

The **`.find_types()`** method of **Table** object return new **Table** with types of each column:

In [3]:
df1.find_types()

'Column','Type','Symbols'
'type','s',1
'size','i',1
'weight','i',2
'when','t',26
'tel','m',12


# Save CSV

Use **`.to_csv`** method to save the data as comma-separated-values. And if you specify an argument, it will be used as a filename for saving the result.

In [4]:
txt = df1.to_csv()
txt

'type,size,weight,when,tel\r\nA,1,12,2018-11-29 13:40:08.116768,+79360360193\r\nB,4,12,2018-10-30 13:40:08.116775,84505505151\r\nA,3,10,2018-10-15 13:40:08.116776,+31415926535\r\nB,6,14,2018-01-03 13:40:08.116777,0\r\nA,4,15,2018-05-07 13:40:08.116778,\r\nA,2,11,2018-05-24 13:40:08.116779,\r\n'

# Load CSV

Use **`.from_csv`** class method to create a Table object with given CSV data.

If the argument **`inline`** is True, then the first argument is considered as CSV text itself. Otherwise, it is considered as a name of CSV file.

In [5]:
df2 = Table.from_csv(txt, inline=True, title='SomeData')
df2

'type','size','weight','when','tel'
'A',1,12,2018-11-29 13:40:08,79360360193
'B',4,12,2018-10-30 13:40:08,84505505151
'A',3,10,2018-10-15 13:40:08,31415926535
'B',6,14,2018-01-03 13:40:08,0
'A',4,15,2018-05-07 13:40:08,''
'A',2,11,2018-05-24 13:40:08,''


The types detected are the same as before serialization:

In [6]:
df2.find_types()

'Column','Type','Symbols'
'type','s',1
'size','i',1
'weight','i',2
'when','t',19
'tel','m',11


Also, you can specify a value to replace **None** values:

In [7]:
df2 = Table.from_csv(txt, inline=True, empty=0, title='OtherData')
df2

'type','size','weight','when','tel'
'A',1,12,2018-11-29 13:40:08,79360360193
'B',4,12,2018-10-30 13:40:08,84505505151
'A',3,10,2018-10-15 13:40:08,31415926535
'B',6,14,2018-01-03 13:40:08,0
'A',4,15,2018-05-07 13:40:08,''
'A',2,11,2018-05-24 13:40:08,''


Or, you can disable preprocessing of the data to leave the values as strings:

In [8]:
df2 = Table.from_csv(txt, inline=True, preprocess=False)
df2

'type','size','weight','when','tel'
'A','1','12','2018-11-29 13:40:08.116768','+79360360193'
'B','4','12','2018-10-30 13:40:08.116775','84505505151'
'A','3','10','2018-10-15 13:40:08.116776','+31415926535'
'B','6','14','2018-01-03 13:40:08.116777','0'
'A','4','15','2018-05-07 13:40:08.116778',''
'A','2','11','2018-05-24 13:40:08.116779',''


# Save SQL creation code & values

Using Lemuras, you can work with SQL! You can generate tables creation code for SQL. It uses automatic detection of columns types that was described earlier.

In [9]:
sql_cr = df1.to_sql_create()
print(sql_cr)

CREATE TABLE `Sample` (
  `type` varchar(1),
  `size` int(1),
  `weight` int(1),
  `when` datetime,
  `tel` varchar(12)
) ;


And get the code to fill the data:

In [10]:
sql_vals = df1.to_sql_values()
print(sql_vals)

INSERT INTO `Sample` VALUES ('A',1,12,'2018-11-29 13:40:08.116768','+79360360193'), ('B',4,12,'2018-10-30 13:40:08.116775','84505505151'), ('A',3,10,'2018-10-15 13:40:08.116776','+31415926535'), ('B',6,14,'2018-01-03 13:40:08.116777','0'), ('A',4,15,'2018-05-07 13:40:08.116778','None'), ('A',2,11,'2018-05-24 13:40:08.116779','');


# Load SQL creation code & values

Firstly, load the table declaration to retrieve the structure:

In [11]:
df2 = Table.from_sql_create(sql_cr)
df2

'type','size','weight','when','tel'


Then, supply the data:

In [12]:
df2.add_sql_values(sql_vals)
df2

'type','size','weight','when','tel'
'A',1,12,2018-11-29 13:40:08,79360360193
'B',4,12,2018-10-30 13:40:08,84505505151
'A',3,10,2018-10-15 13:40:08,31415926535
'B',6,14,2018-01-03 13:40:08,0
'A',4,15,2018-05-07 13:40:08,''
'A',2,11,2018-05-24 13:40:08,''


# Load SQL query result

In addition, you can create Table object using query result string:

In [13]:
sql_result = """+------+------------------+--------+--------+--------+
| id   | name_rus                         | q1     | q2     | q3     |
+------+----------------------------------+--------+--------+--------+
| 1205 | Джем клубничный                  |   3286 |     10 |     14 |
| 1306 | Мед                              |    800 |     19 |     19 |
| 1110 | Блины                            |   5140 |     18 |      3 |
| 2805 | Бургер                           |  18067 |  90817 |  61933 |
| 2604 | Пирожок                          |  47150 | 215139 | 170291 |
| 4446 | Чизкейк                          |   6856 |  17665 |  12808 |
| 4248 | Маффин с яблоком и корицей       |   1765 |   4176 |   2385 |
| 4753 | Вафельный рожок                  |   2158 |  16577 |  11725 |
+------+----------------------------------+--------+--------+--------+"""

df2 = Table.from_sql_result(sql_result, title='Goods')
df2

'id','name_rus','q1','q2','q3'
1205,'Джем клубничный',3286,10,14
1306,'Мед',800,19,19
1110,'Блины',5140,18,3
2805,'Бургер',18067,90817,61933
...,...,...,...,...


**Warning!** Note that the string must start with `+--` symbols, without any leading whitespace symbols! It will be simplified in the future.

# Save JSON

You can save a Table object to JSON string with rows as lists (by default) and set **`pretty`** to get more readable text:

In [14]:
s = df1.to_json(pretty=True)
print(s)

{
  "columns": [
    "type", "size", "weight", "when", "tel"
  ], 
  "rows": [
    [
      "A", 1, 12, "2018-11-29 13:40:08.116768", "+79360360193"
    ], [
      "B", 4, 12, "2018-10-30 13:40:08.116775", 84505505151
    ], [
      "A", 3, 10, "2018-10-15 13:40:08.116776", "+31415926535"
    ], [
      "B", 6, 14, "2018-01-03 13:40:08.116777", 0
    ], [
      "A", 4, 15, "2018-05-07 13:40:08.116778", "None"
    ], [
      "A", 2, 11, "2018-05-24 13:40:08.116779", ""
    ]
  ], 
  "title": "Sample"
}


Or you can save rows as objects (though it is much less compact):

In [15]:
s = df1.to_json(as_dict=True, pretty=True)
print(s)

{
  "columns": [
    "type", "size", "weight", "when", "tel"
  ], 
  "rows": [
    {
      "type": "A", "size": 1, "weight": 12, "when": "2018-11-29 13:40:08.116768", "tel": "+79360360193"
    }, {
      "type": "B", "size": 4, "weight": 12, "when": "2018-10-30 13:40:08.116775", "tel": 84505505151
    }, {
      "type": "A", "size": 3, "weight": 10, "when": "2018-10-15 13:40:08.116776", "tel": "+31415926535"
    }, {
      "type": "B", "size": 6, "weight": 14, "when": "2018-01-03 13:40:08.116777", "tel": 0
    }, {
      "type": "A", "size": 4, "weight": 15, "when": "2018-05-07 13:40:08.116778", "tel": "None"
    }, {
      "type": "A", "size": 2, "weight": 11, "when": "2018-05-24 13:40:08.116779", "tel": ""
    }
  ], 
  "title": "Sample"
}


# Load JSON

You can load a JSON string with one of two mentioned formats (but **title** is optional):

In [16]:
df2 = Table.from_json(s)
df2

'type','size','weight','when','tel'
'A',1,12,2018-11-29 13:40:08,79360360193
'B',4,12,2018-10-30 13:40:08,84505505151
'A',3,10,2018-10-15 13:40:08,31415926535
'B',6,14,2018-01-03 13:40:08,0
'A',4,15,2018-05-07 13:40:08,''
'A',2,11,2018-05-24 13:40:08,''


# Save HTML

To save data as an HTML table use **`.html()`** instance method. To turn off default cutting of rows and columns disable *`cut`* optional parameter:

In [17]:
df1.html(cut=False)

"<table>\n<tr><th>'type'</th><th>'size'</th><th>'weight'</th><th>'when'</th><th>'tel'</th></tr>\n<tr><td>'A'</td><td>1</td><td>12</td><td>2018-11-29 13:40:08.116768</td><td>'+79360360193'</td></tr>\n<tr><td>'B'</td><td>4</td><td>12</td><td>2018-10-30 13:40:08.116775</td><td>84505505151</td></tr>\n<tr><td>'A'</td><td>3</td><td>10</td><td>2018-10-15 13:40:08.116776</td><td>'+31415926535'</td></tr>\n<tr><td>'B'</td><td>6</td><td>14</td><td>2018-01-03 13:40:08.116777</td><td>0</td></tr>\n<tr><td>'A'</td><td>4</td><td>15</td><td>2018-05-07 13:40:08.116778</td><td>None</td></tr>\n<tr><td>'A'</td><td>2</td><td>11</td><td>2018-05-24 13:40:08.116779</td><td>''</td></tr>\n</table>"

By the way, Table objects output for these Jupyter Notebooks is implemented using this method.

In [18]:
df1

'type','size','weight','when','tel'
'A',1,12,2018-11-29 13:40:08.116768,'+79360360193'
'B',4,12,2018-10-30 13:40:08.116775,84505505151
'A',3,10,2018-10-15 13:40:08.116776,'+31415926535'
'B',6,14,2018-01-03 13:40:08.116777,0
'A',4,15,2018-05-07 13:40:08.116778,
'A',2,11,2018-05-24 13:40:08.116779,''


# Load HTML

Lemurs v1.1.7 brings us an opportunity to parse HTML tables! This is quite simple:

In [19]:
html_table = """
<table>
    <thead><tr>
        <th>Name</th><th>Value</th>
    </tr></thead>
    <tbody><tr class="odd">
        <td>Pi</td><td>3.1415926535</td>
    </tr><tr class="even">
        <td>Euler</td><td>2.7182818284</td>
    </tr><tr class="odd">
        <td>Phi</td><td>1.6180339887</td>
    </tr></tbody>
</table>"""

df2 = Table.from_html(html_table, title='Numbers')
df2

'Name','Value'
'Pi',3.1415926535
'Euler',2.7182818284
'Phi',1.6180339887


Note that to use this method you must have [BeautifulSoup 4 module installed](https://www.crummy.com/software/BeautifulSoup/bs4/doc/#installing-beautiful-soup).