# LUX

A picture is worth a thousand words, even more so when it comes to data-centric projects. Data exploration is the first step in any machine learning project, and it is pivotal to how well the rest of the project turns out. Although libraries like Plotly and Seaborn provide a huge collection of plots and options, they require the user to first think about how the visualization should look like and what to visualize in the first place. This is not conducive to data exploration and just contributes to making it the most time-consuming part of the machine learning life cycle. Well, what if you could get visualizations recommended to you? Lux is a Python package created by the folks at RiseLabs that aims to make data exploration easier and quicker with its simple one-line syntax and visualization recommendations. As the developers put it “Lux is built on the philosophy that users should always be able to visualize anything they want without having to think about how the visualization should look like“. 

# Code Implementation

## Installation

Install Lux from PyPI

In [None]:
!python -m pip install pip --upgrade --user -q
!python -m pip install numpy pandas seaborn matplotlib scipy sklearn statsmodels --user -q

In [None]:
!python -m pip install lux-api --user -q

In [None]:
import IPython
IPython.Application.instance().kernel.do_shutdown(True)

Install and activate the Lux notebook extension (lux-widget) included in the package.

For VsCode and Jupyter notebook

 jupyter nbextension install --py luxwidget


 jupyter nbextension enable --py luxwidget 

For JupyterLab

 jupyter labextension install @jupyter-widgets/jupyterlab-manager

 
 jupyter labextension install luxwidget 

In [None]:
!jupyter labextension install @jupyter-widgets/jupyterlab-manager

!jupyter labextension install luxwidget 

Check other methods of installation [here](https://lux-api.readthedocs.io/en/latest/source/getting_started/installation.html#manual-installation-dev-setup).

Lux is designed to be tightly integrated with Pandas and can be used as-is, without modifying your existing Pandas code. To enable Lux, simply add `import lux` along with your Pandas import statement.

In [None]:
import pandas as pd
import lux

Lux preserves the Pandas dataframe semantics -- which means that you can apply any command from Pandas's API to the dataframes in Lux and expect the same behavior. For example, we can load the dataset via standard Pandas `read_*` commands.

In [None]:
df = pd.read_csv("https://raw.githubusercontent.com/Aditya1001001/English-Premier-League/master/EDA_data.csv")

In [None]:
df.columns

Lux is built on the philosophy that generating useful visualizations should be as simple as printing out a dataframe. 
When you print out the dataframe in the notebook, you should see the default Pandas table display with an additional Toggle button. 

By clicking on the Toggle button, you can now explore the data visually through Lux. You should see three tabs of visualizations recommended to you. 

In [None]:
df

### Visualizing Dataframes with Recommendations

You have generated your first set of visualizations through Lux!


Recommendations highlight interesting patterns and trends in your dataframe. Lux offers different types of recommendations, known as _analytical actions_. These analytical actions represent different analysis that can be performed on the data.








Lux recommends a set of actions depending on the content of your dataframe and your analysis goals and interests (described later). 




## Specifying Intent in Lux

Lux provides a flexible language for communicating your analysis intent to the system, so that Lux can provide better and more relevant recommendations to you. In this tutorial, we will see different ways of specifying the intent, including the attributes and values that you are interested or not interested in, enumeration specifiers, as well as any constraints on the visualization encoding.

The primary way to set the current intent associated with a dataframe is by setting the `intent` property of the dataframe, and providing a list of specification as input. We will first describe how intent can be specified through convenient shorthand descriptions as string inputs, then we will describe advance usage via the `lux.Clause` object.


### Specifying attributes of interest

You can indicate that you are interested in an attribute, let's say `value_eur`.

In [None]:
df.intent = ['value_eur']
df

- **Enhance** adds an additional attribute to intended visualization. Enhance lets users compare the effect the added variable on the intended visualization. For example, enhance displays visualizations involving C' = {MedianEarnings, *added attribute*}, including:

    - {MedianEarnings, **Expenditure**}
    - {MedianEarnings, **AverageCost**}
    - {MedianEarnings, **AverageFacultySalary**}.
    
- **Filter** adds an additional filter to the intended visualization. Filter lets users browse through what the intended visualization looks like for different subsets of data. For example, Filter displays visualizations involving C' = {MedianEarnings, *added filter*}, including: 

    - {MedianEarnings, **FundingModel=Public**}
    - {MedianEarnings, **Region=Southeast**}
    - {MedianEarnings, **Region=Great Lakes**}.
    http://hosteddocs.ittoolbox.com/fourshowmeautomaticpresentations.pdf

You might be interested in multiple attributes, for instance you might want to look at both `overall` and `value_eur`. When multiple clauses are specified, Lux applies all the clauses in the intent and searches for visualizations that are relevant to `overall` **and** `value_eur`.

In [None]:
df.intent = ['overall','value_eur']
df

Let's say that in addition to `overall`, you are interested in the looking at a list of attributes that are related to different performance measures, such as `Passes per match` or `Goals per match`, and how they breakdown with respect to `Position`. 

You can specify a list of desired attributes separated by the `|` symbol, which indicates an `OR` relationship between the list of attributes. If multiple clauses are specified, Lux automatically create combinations of the specified attributes. 

In [None]:
possible_attributes = "Passes per match|Goals per match|overall|Tackles"
df.intent = [possible_attributes,"Position"]
df

Alternatively, you could also provide the specification as a list: 

In [None]:
possible_attributes = ['Passes per match','Goals per match','overall','Tackles']
df.intent = [possible_attributes,"Position"]
df

### Specifying values of interest

In Lux, you can also specify particular values corresponding to subsets of the data that you might be interested in. For example, you may be interested in only Midfielders 



In [None]:
df.intent = ["Position=Midfielder"]
df

You can also specify multiple values of interest using the same `|` notation that we saw earlier. For example, you can compare the overall ratings of players from England, France and Germany.

In [None]:
df.intent = ["Position=Midfielder|Defender"]
df

In [None]:
df.clear_intent()

### Applying Filters v.s. Expressing Filter Intent

You might be wondering what is the difference between specifying values of interest through the intent in Lux versus applying a filter directly on the dataframe through Pandas. By specifying the intent directly via Pandas, Lux is not aware of the specified inputs to Pandas, so these values of interest will not be reflected in the recommendations.

In [None]:
df[df["Position"]=="Forward"]

Specifying the values through `set_intent` tells Lux that you are interested in Forwards. In the resulting Filter action, we see that Lux suggests visualizations in other `Positions`s as recommendations.

In [None]:
df.intent = ["Position=Forward"]
df

So while both approaches applies the filter on the specified visualization, the subtle difference between *applying* a filter and *indicating* a filter intent leads to different sets of resulting recommendations. In general, we encourage using Pandas for filtering if you are certain about applying the filter (e.g., a cleaning operation deleting a specific data subset), and specify the intent through Lux if you might want to experiment and change aspects related to the filter in your analysis. 

### Advanced intent specification through `lux.Clause`

The basic string-based description provides a convenient way of specifying the intent. However, not all specification can be expressed through the string-based descriptions, more complex specification can be expressed through the `lux.Clause` object. The two modes of specification is essentially equivalent, with the Parser parsing the `description` field in the `lux.Clause` object.

#### Specifying attributes or values of interest

To see an example of how lux.Clause is used, we rewrite our earlier example of expressing interest in `AverageCost` as: 

In [None]:
df.intent = [lux.Clause(attribute='overall')]
df

In [None]:
df.intent = ['overall',
                lux.Clause(attribute='nationality',filter_op='=', value=['England','France','Germany'])]
df

Both the `attribute` and `value` fields can take in either a single string or a list of attributes to specify items of interest. This example also demonstrates how we can intermix the `lux.Clause` specification alongside the basic string-based specification for convenience.

#### Adding constraints to override auto-inferred details

So far, we have seen examples of how Lux takes in a loosely specified intent and automatically fills in many of the details that is required to generate the intended visualizations. There are situations where the user may want to override these auto-inferred values. For example, you might be interested in fixing an attribute to show up on a particular axis, ensuring that an aggregated attribute is summed up instead of averaged by default, or picking a specific bin size for a histogram. Additional properties specified on lux.Clause acts as constraints to the specified intent. 

As we saw earlier, when we set `overall` as the intent, Lux generates a histogram with `overall` on the x-axis.
While this is unconventional, let's say that instead we want to set `overall` to the y axis. We would specify this as additional properties to constrain the intent clause.

In [None]:
df.intent = [lux.Clause(attribute='overall', channel='y')]
df

We can also set constraints on the type of aggregation that is used. For example, by default, we use `mean` as the default aggregation function for quantitative attributes.

We can override the aggregation function to be `sum` instead. 

In [None]:
df.intent = ["value_eur",lux.Clause("overall",aggregation="sum")]
df

The possible aggregation values are the same as the ones supported in Pandas's [agg](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.agg.html) function, which can either be a string shorthand (e.g., "sum", "count", "min", "max", "median") or as a numpy aggregation function.


For example, we can change the aggregation function to be the point-to-point value ([np.ptp](https://numpy.org/doc/stable/reference/generated/numpy.ptp.html)) by inputting the numpy function.

In [None]:
import numpy as np
df.intent = ["HighestDegree",lux.Clause("AverageCost",aggregation=np.ptp)]
df

### Specifying wildcards

Let's say that you are interested in *any* attribute with respect to `AverageCost`. Lux support *wildcards* (based on [CompassQL](https://idl.cs.washington.edu/papers/compassql/) ), which specifies the enumeration of any possible attribute or values that satisfies the provided constraints.

In [None]:
df.intent = ['value_eur',lux.Clause('?')]
df

The space of enumeration can be narrowed based on constraints. For example, you might only be interested in looking at scatterplots of `value_eur` with respect to quantitative attributes. This narrows the 44 visualizations that we had earlier to only 28 visualizations now, involving only quantitative attributes.

In [None]:
df.intent = ['value_eur',lux.Clause('?',data_type='quantitative')]
df

The enumeration specifier can also be placed on the value field. For example, you might be interested in looking at how the distribution of `value_eur` varies for all possible values of `Position`.


In [None]:
df.intent = ['value_eur','Position=?']
df

In [None]:
df.intent = ['overall',lux.Clause(attribute='Position',filter_op='=',value='?')]
df

# Creating Desired Visualizations On-Demand using `Vis`

A `Vis` object represents an individual visualization displayed in Lux, which can either be automatically generated or defined by the user.

To generate a `Vis`, users should specify their intent and a source dataframe as inputs. The intent is expressed using the same intent specification language described in the last tutorial. 

For example, here we indicate our intent for visualizing the `overall` attribute on the dataframe `df`.

In [None]:
from lux.vis.Vis import Vis
intent = ["overall"]
vis = Vis(intent,df)
vis

We can very easily replace the Vis's source data without changing the `Vis` definition, which is useful for comparing differences across different datasets with the same schema. 

For example, we might be interested in the same `overall` distribution, but plotted only on the subset of data with Forwards.

In [None]:
vis.refresh_source(df[df["Position"]=='Forward'])
vis

Likewise, we can modify the intent of the query, in this case, to increase the bin size of the histogram and to indicate the filtered source:

In [None]:
new_intent = [lux.Clause("overall",bin_size=50),"Position=Forward"]
vis.set_intent(new_intent)
vis

`Vis` objects are powerful programmatic representations of visualizations that can be exported into visualization code (more in the next tutorial) or be composed into a `VisList` collection.

# Working with Collections of Visualization with `VisList`

`VisList` objects represent collections of visualizations in Lux.

There are two ways to specify lists of visualization in Lux: 1) by specifying intent or 2) by manually composing `Vis` object into a list.

### Approach #1: Specifying `VisList` using intent syntax

First, we look at an example of a `VisList` created through a user intent. Here, we create a vis collection of `overall` with respect to all other attributes, using the wildcard "?" symbol.

In [None]:
from lux.vis.VisList import VisList
vc = VisList(["overall","?"],df)
vc

Alternatively, we can specify desired attributes via a list with respect to `overall`: 

In [None]:
vc = VisList(["overall",['Passes per match','Goals per match','Tackles','Position']],df)
vc

### Approach #2: Specifying `VisList` by constructing `Vis` objects

`VisList` can be manually constructed by individually specifying the content of each `Vis`, then finally putting the entire list into a `VisList` object.

Here is the equivalent `VisList` example constructed using this approach:

In [None]:
from lux.vis.VisList import VisList

vcLst = []
for attribute in ['Passes per match','Goals per match','Tackles','Position']: 
    vis = Vis([lux.Clause("overall"), lux.Clause(attribute)])
    vcLst.append(vis)
vc = VisList(vcLst,df)
vc