# DBSCAN (Beta)

In [None]:
DBSCAN(name: str,
       cursor = None,
       eps: float = 0.5,
       min_samples: int = 5,
       p: int = 2)

Creates a DBSCAN object by using the DBSCAN algorithm as defined by Martin 
Ester, Hans-Peter Kriegel, Jörg Sander, and Xiaowei Xu. This object uses 
pure SQL to compute all the distances and neighbors. It uses Python 
to compute the cluster propagation (non-scalable phase). Its use of 
CROSS JOIN and may be  expensive in some cases. It indexes all the 
elements of the table in order to be optimal (the CROSS JOIN will happen only 
with IDs which are integers). As DBSCAN is using the p-distance, it is highly 
sensitive to unnormalized data. However, DBSCAN is robust to outliers 
and can find non-linear clusters. It is a very powerful algorithm for outlier 
detection and clustering.

### Parameters

<table id="parameters">
    <tr> <th>Name</th> <th>Type</th> <th>Optional</th> <th>Description</th> </tr>
    <tr> <td><div class="param_name">name</div></td> <td><div class="type">str</div></td> <td><div class = "no">&#10060;</div></td> <td>Name of the the model. As it is not a built in model, this name will be used to build the final table.</td> </tr>
    <tr> <td><div class="param_name">cursor</div></td> <td><div class="type">DBcursor</div></td> <td><div class = "yes">&#10003;</div></td> <td>Vertica DB cursor.</td> </tr>
    <tr> <td><div class="param_name">eps</div></td> <td><div class="type">float</div></td> <td><div class = "yes">&#10003;</div></td> <td>The radius of a neighborhood with respect to some point.</td> </tr>
    <tr> <td><div class="param_name">min_samples</div></td> <td><div class="type">int</div></td> <td><div class = "yes">&#10003;</div></td> <td>Minimum number of points required to form a dense region.</td> </tr>
    <tr> <td><div class="param_name">p</div></td> <td><div class="type">int</div></td> <td><div class = "yes">&#10003;</div></td> <td>The p of the p-distance (distance metric used during the model computation).</td> </tr>
</table>

### Attributes

After the object creation, all the parameters become attributes. The model will also create extra attributes when fitting the model:

<table id="parameters">
    <tr> <th>Name</th> <th>Type</th>  <th>Description</th> </tr>
    <tr> <td><div class="param_name">n_cluster</div></td> <td><div class="type">int</div></td> <td>Number of clusters created during the process.</td> </tr>
    <tr> <td><div class="param_name">n_noise</div></td> <td><div class="type">int</div></td> <td>Number of points with no clusters.</td> </tr>
    <tr> <td><div class="param_name">input_relation</div></td> <td><div class="type">str</div></td> <td>Train relation.</td> </tr>
    <tr> <td><div class="param_name">X</div></td> <td><div class="type">list</div></td> <td>List of the predictors.</td> </tr>
    <tr> <td><div class="param_name">key_columns</div></td> <td><div class="type">list</div></td> <td>Columns not used during the algorithm computation but which will be used to create the final relation.</td> </tr>
</table>

### Methods

<table id="parameters">
    <tr> <th>Name</th> <th>Description</th> </tr>
    <tr> <td><a href="../Unsupervised/fit2">fit</a></td> <td>Trains the model.</td> </tr>
    <tr> <td><a href="../Unsupervised/info">info</a></td> <td>Displays some information about the model.</td> </tr>
    <tr> <td><a href="../Unsupervised/plot2">plot</a></td> <td>Draws the model if the number of predictors is 2 or 3.</td> </tr>
    <tr> <td><a href="../Unsupervised/to_vdf">to_vdf</a></td> <td>Creates a vDataFrame of the model.</td> </tr>
</table>

### Example

In [1]:
from vertica_ml_python.learn.cluster import DBSCAN
model = DBSCAN(name = "public.DBSCAN_heart")
print(model)

<DBSCAN>
