## DataFramePlot

### Set environmental variables

In order to properly load modules within this notebook from outside the repository folder, set the script **PATH** below,  e.g. ```C:/DataFramePlot```:

In [None]:
PATH = "/path/to/DataFramePlot" # <-- optional if running from native path

In [None]:
import importlib.util, os

if not os.path.isdir(PATH):
    PATH = os.getcwd()
PATH = os.path.realpath(PATH)

spec = importlib.util.spec_from_file_location("__init__", PATH+'/__init__.py')
init = importlib.util.module_from_spec(spec)
spec.loader.exec_module(init)

%matplotlib inline
%load_ext autoreload
%autoreload 2

### Import functions

In [None]:
import plotly.offline as py

from tagging import df_load
from tagging import df_write
from tagging import tag_df
from tagging import tagged_bar
from tagging import tagged_series

py.init_notebook_mode(connected=True)

### Load data from file

Accepts both pure text files and Excel extension formats: `CSV`, `TAB`, `TXT`, `XLS`, `XLSX`.

In [None]:
file_name = ""

df = df_load(file_name); df.head(1)

### Tag data

Select parameters to optionally tag rows in data frame by text matching rules. **Note:** only considers string values by default.

In [None]:
keywords = ""             # comma separated (required)
tag_name = ""             # category or tag name (optional)
columns  = ""             # comma separated (optional)

case_sensitive = False    # match keywords as case sensitive (AaBbCc) 
invert_match   = False    # tag unmatching rows only i.e. not matching any rule
whole_words    = False    # do NOT allow partial matches (e.g 'book' matches 'books')

tag_column     = "tag"    # existing or new field title (required)
output_folder  = "TAGGED" # output directory for new files (optional)

df = tag_df(df, keywords, tag_name, columns, tag_column, case_sensitive, invert_match, whole_words)

### Plot bar chart

Generate a simple bar chart comparing each tag count. **Note:** disregards untagged rows by default.

In [None]:
tagged_bar(df, y=tag_column, output_folder=output_folder, untagged=False, inline=True)

### Plot time series

Generate a line chart comparing tags by date. Requires a valid `date_column` with corresponding time values. **Note:** disregards untagged rows by default.

In [None]:
date_column = ""

In [None]:
tagged_series(df, x=date_column, y=tag_column, output_folder=output_folder, untagged=False, inline=True)

### Write tagged data to file

Accepts both pure text `CSV` (Comma Separated Values) and `XLS/XLSX` (Excel) extension formats.

In [None]:
output_file = "tagged_data.csv" # "tagged_data.xls"

df_write(df, "%s/%s" % (output_folder, output_file))

#### Compress output →  `output.zip`

In [None]:
!zip output.zip TAGGED

### [Download output files](output.zip)

___

### References

* Pandas documentation: https://pandas.pydata.org/

* Plotly documentation: https://plotly.com/python/