<h1><img src="https://3yecy51kdipx3blyi37oute1-wpengine.netdna-ssl.com/wp-content/uploads/2016/11/influxdata_400x200.png" style="height: 80px;"/></h1>

InfluxDB is a time series database designed to handle timestamped data, including DevOps monitoring, application metrics, IoT sensor data, and real-time analytics.

## Key features

- CLI/HTTP write and query API.
- Expressive SQL-like query language
- Schemas don't have to be defined up front and schema preferences may change over time. 
- Tags allow series to be indexed for fast and efficient queries.
- Retention policies efficiently auto-expire stale data.
- Plugins support for other data ingestion protocols such as Graphite, collectd, and OpenTSDB.
- Continuous queries automatically compute aggregate data to make frequent queries more efficient.
- InfluxDB isn’t fully CRUD
- The open source edition of InfluxDB runs on a single node, high availability is only available in the InfluxDB Enterprise Edition.

## Data structure
- Data in InfluxDB is organized by “time series”
- Time series have zero to many `points`, one for each discrete sample of the metric, consisting of:
    - a `time` (a timestamp)
    - a `measurement` (“Component temperature”, for example)
    - at least one key-value `field` (the measured value itself, e.g. “value=0.64”, or “temperature=21.2”)
    - zero to many key-value `tags` containing any metadata about the value
    
Conceptually a measurement is assimilable to a SQL table, where the primary index is always `time`. `tags` and `fields` are effectively columns in the table. `tags` are indexed, `fields` are not. 

<table>
<thead>
<tr>
<th align="left">Element</th>
<th align="left">Optional/Required</th>
<th align="left">Description</th>
<th align="left">Type<br>(See <a href="https://docs.influxdata.com/influxdb/v1.7/write_protocols/line_protocol_reference/#data-types">data types</a> for more information.)</th>
</tr>
</thead>

<tbody>
<tr>
<td align="left"><a href="https://docs.influxdata.com/influxdb/v1.7/concepts/glossary/#measurement">Measurement</a></td>
<td align="left">Required</td>
<td align="left">The measurement name. InfluxDB accepts one measurement per point.</td>
<td align="left">String</td>
</tr>

<tr>
<td align="left"><a href="https://docs.influxdata.com/influxdb/v1.7/concepts/glossary/#tag-set">Tag set</a></td>
<td align="left">Optional</td>
<td align="left">All tag key-value pairs for the point.</td>
<td align="left"><a href="https://docs.influxdata.com/influxdb/v1.7/concepts/glossary/#tag-key">Tag keys</a> and <a href="https://docs.influxdata.com/influxdb/v1.7/concepts/glossary/#tag-value">tag values</a> are both strings.</td>
</tr>

<tr>
<td align="left"><a href="https://docs.influxdata.com/influxdb/v1.7/concepts/glossary/#field-set">Field set</a></td>
<td align="left">Required. Points must have at least one field.</td>
<td align="left">All field key-value pairs for the point.</td>
<td align="left"><a href="https://docs.influxdata.com/influxdb/v1.7/concepts/glossary/#field-key">Field keys</a> are strings. <a href="https://docs.influxdata.com/influxdb/v1.7/concepts/glossary/#field-value">Field values</a> can be floats, integers, strings, or Booleans.</td>
</tr>

<tr>
<td align="left"><a href="https://docs.influxdata.com/influxdb/v1.7/concepts/glossary/#timestamp">Timestamp</a></td>
<td align="left">Optional. InfluxDB uses the server’s local nanosecond timestamp in UTC if the timestamp is not included with the point.</td>
<td align="left">The timestamp for the data point. InfluxDB accepts one timestamp per point.</td>
<td align="left">Unix nanosecond timestamp. Specify alternative precisions with the <a href="https://docs.influxdata.com/influxdb/v1.7/tools/api/#write-http-endpoint">InfluxDB API</a>.</td>
</tr>
</tbody>
</table>


## Data type

<table>
<thead>
<tr>
<th align="left">Datatype</th>
<th align="left">Element(s)</th>
<th align="left">Description</th>
</tr>
</thead>

<tbody>
<tr>
<td align="left">Float</td>
<td align="left">Field values</td>
<td align="left">IEEE-754 64-bit floating-point numbers. This is the default numerical type. Examples: <code>1</code>, <code>1.0</code>, <code>1.e+78</code>, <code>1.E+78</code>.</td>
</tr>

<tr>
<td align="left">Integer</td>
<td align="left">Field values</td>
<td align="left">Signed 64-bit integers (-9223372036854775808 to 9223372036854775807). Specify an integer with a trailing <code>i</code> on the number. Example: <code>1i</code>.</td>
</tr>

<tr>
<td align="left">String</td>
<td align="left">Measurements, tag keys, tag values, field keys, field values</td>
<td align="left">Length limit 64KB.</td>
</tr>

<tr>
<td align="left">Boolean</td>
<td align="left">Field values</td>
<td align="left">Stores TRUE or FALSE values.<br><br>TRUE write syntax:<code>[t, T, true, True, TRUE]</code>.<br><br>FALSE write syntax:<code>[f, F, false, False, FALSE]</code></td>
</tr>

<tr>
<td align="left">Timestamp</td>
<td align="left">Timestamps</td>
<td align="left">Unix nanosecond timestamp. Specify alternative precisions with the <a href="/influxdb/v1.7/tools/api/#write-http-endpoint">InfluxDB API</a>. The minimum valid timestamp is <code>-9223372036854775806</code> or <code>1677-09-21T00:12:43.145224194Z</code>. The maximum valid timestamp is <code>9223372036854775806</code> or <code>2262-04-11T23:47:16.854775806Z</code>.</td>
</tr>
</tbody>
</table>

Python module [documentation](https://influxdb-python.readthedocs.io/en/latest/index.html)

In [1]:
feed_db_arraytime import (datetime, timedelta)
from random import (choice, randint, random, uniform, lognormvariate)
from influxdb import (InfluxDBClient, DataFrameClient)

In [2]:
INFLUXDB_USER='telegraf'
INFLUXDB_USER_PASSWORD='secretpassword'

host='db.influxdb.app.com'
port=8086
"""Instantiate a connection to the InfluxDB."""
user = 'admin'
password = 'supersecretpassword'
dbname = 'example'

dbuser = 'telegraf'
dbuser_password = 'secretpassword'

client = InfluxDBClient(host, port, user, password, dbname)

In [14]:
client.drop_retention_policy('custom_policy', dbname)

In [64]:
client.create_database(dbname)

In [65]:
client.create_retention_policy('custom_policy', '30d', 3, default=True)

In [66]:
client.switch_user(dbuser, dbuser_password)

In [67]:
(datetime.now() - timedelta(seconds=10)).strftime("%Y-%m-%dT%H:%M:%S")

'2019-08-19T21:03:25'

In [170]:
def feed_db(n):
    json_body = [
        {
            "measurement": "starfleet_01",
            "tags": {
                "cmdt": choice(("Archer", "Kirk", "Kruge")),
                "region": choice(("Andoria", "Deep Space Nine", "Earth", "Genesis")),
                "spacecraft": f"NX-17{randint(1, 10):02d}"
            },
            "time": (datetime.now() - timedelta(seconds=2*(n - i))).strftime("%Y-%m-%dT%H:%M:%S"),
            "fields": {
                "speed": lognormvariate(10, 3),
                "consumption": uniform(0, 300),
                "pressure_a": 3 + random(),
                "status_b": choice((True, False))
            }
        } for i in range(n)
    ]
    client.write_points(json_body, batch_size=100_000)

In [174]:
def feed_db_array(n):
    json_body = [
        {
            "measurement": "starfleet_02",
            "tags": {
                "cmdt": choice(("Archer", "Kirk", "Kruge")),
                "region": choice(("Andoria", "Deep Space Nine", "Earth", "Genesis")),
                "spacecraft": f"NX-17{randint(1, 10):02d}"
            },
            "time": (datetime.now() - timedelta(seconds=2*(n - i))).strftime("%Y-%m-%dT%H:%M:%S"),
            "fields": {
                "curve_b[0]": 10 + random(),
                "curve_b[1]": choice((15 + random(), 4 + random())),
                "curve_b[2]": 20 + random(),
            }
        } for i in range(n)
    ]
    client.write_points(json_body, batch_size=100_000)
    


In [172]:
feed_db(1_000_000)

In [175]:
feed_db_array(1_000_000)

In [176]:
client.query(query='select count(*) from starfleet_01')

ResultSet({'('starfleet_01', None)': [{'time': '1970-01-01T00:00:00Z', 'count_consumption': 1000000, 'count_pressure_a': 1000000, 'count_speed': 1000000, 'count_status_b': 1000000}]})

In [177]:
from IPython.core.magic import (register_line_cell_magic)

@register_line_cell_magic
def influxql(line, cell=None):
    "Magic that works both as %lcmagic and as %%lcmagic"
    if cell is None:
        sqlstr = line
    else:
        sqlstr = ";".join((op.strip(";") for op in cell.strip("\n").split("\n"))) + ";"
    return client.query(query=sqlstr)

In [183]:
%influxql SELECT COUNT(*) FROM "starfleet_02" GROUP_BY "region"

InfluxDBClientError: 400: {"error":"error parsing query: found GROUP_BY, expected ; at line 1, char 37"}


In [126]:
%influxql SELECT COUNT(DISTINCT("spacecraft")) FROM "starfleet"

ResultSet({'('starfleet', None)': [{'time': '1970-01-01T00:00:00Z', 'count': 100}]})

In [128]:
%influxql SELECT MODE("region") FROM "starfleet"

ResultSet({})

In [129]:
%influxql SELECT SPREAD("pressure_a") FROM "starfleet"

ResultSet({'('starfleet', None)': [{'time': '1970-01-01T00:00:00Z', 'spread': 0.9999997563568188}]})

In [145]:
res = %influxql SELECT * FROM "starfleet" WHERE "pressure_a" > 3.5

In [146]:
points = res.get_points(tags={"region": "Andoria"})

In [147]:
from pandas import DataFrame

In [148]:
df = DataFrame(points)

In [149]:
df.head()

Unnamed: 0,consumption,curve_b[0],curve_b[1],curve_b[2],pressure_a,region,spacecraft,speed,status_b,time,who
0,37.925259,1,2,3,3.85028,Andoria,NX-80020,9.359294,False,2019-07-27T18:08:22Z,James T. Kirk
1,93.973752,1,2,3,3.748505,Andoria,NX-80062,17038.83,True,2019-07-27T18:08:40Z,Spock
2,120.489248,1,2,3,3.605678,Andoria,NX-80029,4525.491,True,2019-07-27T18:09:42Z,James T. Kirk
3,211.196781,1,2,3,3.641109,Andoria,NX-80001,587.4285,True,2019-07-27T18:09:44Z,James T. Kirk
4,245.707595,1,2,3,3.645298,Andoria,NX-80077,1445701.0,True,2019-07-27T18:10:10Z,James T. Kirk


In [95]:
%influxql SHOW QUERIES;

ResultSet({'('results', None)': [{'qid': 812, 'query': 'SHOW QUERIES', 'database': 'example', 'duration': '160µs', 'status': 'running'}]})

In [227]:
client.query(query='select speed from starfleet WHERE customer="Spock" LIMIT 10')

ResultSet({})

In [106]:
client.query('SHOW SERIES')

ResultSet({'('results', None)': [{'key': 'starfleet,region=Andoria,who=James\\ T.\\ Kirk'}, {'key': 'starfleet,region=Andoria,who=Spock'}, {'key': 'starfleet,region=Deep\\ Space\\ Nine,who=James\\ T.\\ Kirk'}, {'key': 'starfleet,region=Deep\\ Space\\ Nine,who=Spock'}, {'key': 'starfleet,region=Earth,who=James\\ T.\\ Kirk'}, {'key': 'starfleet,region=Earth,who=Spock'}]})

In [229]:
result = client.query(query='select speed from starfleet WHERE time > now() - 1d LIMIT 10;')
result

ResultSet({'('starfleet', None)': [{'time': '2019-08-18T19:22:42Z', 'speed': 306.6993676747536}]})

In [213]:
dfClient = DataFrameClient(host, port, user, password, dbname)

In [232]:
dfClient.query(query='select * from starfleet WHERE time > now() - 1d LIMIT 10;')

defaultdict(list,
            {'starfleet':                                      count_consumption  count_pressure_µ  \
             2019-08-17 19:30:12.103923669+00:00              43005             43005   
             
                                                  count_spacecraft  count_speed  \
             2019-08-17 19:30:12.103923669+00:00             43005        43005   
             
                                                  count_status_a  
             2019-08-17 19:30:12.103923669+00:00           43005  })

In [207]:
result = client.query(query='SELECT * FROM "h2o_feet","h2o_pH"')

In [208]:
result

ResultSet({})

In [206]:
client.query("SHOW TAG VALUES WITH KEY = 'customer'")

InfluxDBClientError: 400: {"error":"error parsing query: found customer, expected identifier at line 1, char 27"}


In [201]:
client.query("SHOW TAG KEYS")

ResultSet({'('starfleet', None)': [{'tagKey': 'customer'}, {'tagKey': 'region'}]})

In [199]:
client.query("SELECT * from starfleet LIMIT 1")

ResultSet({'('starfleet', None)': [{'time': '2019-08-17T14:36:18Z', 'consumption': 126.73624498383099, 'customer': 'Spock', 'pressure_µ': 3.0421040377041475, 'region': 'Vulcan', 'spacecraft': 'NX-80076', 'speed': 2375.1424165046697, 'status_a': False}]})