# Quickstart

## Install Spark

Koalas requires Spark. You can install Spark by pip, conda or manual installation install. 


**Manual Installation**

Download Spark release [here](https://spark.apache.org/downloads.html) with setting `PYTHONPATH` environment variable to indicate `pyspark.zip` and zipped Py4J file under `SPARK_HOME/python/lib`.


**PIP installation**

`pip install pyspark`

**Conda installation**

`conda install -c conda-forge pyspark`


## Install Koalas

After installing Spark, you can install Koalas 

**PIP installation**

`pip install koalas`

**Conda installation**

`conda install -c conda-forge koalas`

## Install Koalas in Databricks notebooks

Koalas requires Databricks Runtime 5.x or above. For the regular Databricks Runtime, you can install Koalas using the Libraries tab on the cluster UI, or using dbutils in a notebook as below.

In [1]:
dbutils.library.installPyPI("koalas")
dbutils.library.restartPython()

You can import and run [this current notebook](https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/1177812384365889/914420683578197/4036358933921776/latest.html) to install and start.

In the future, we will package Koalas out-of-the-box in both the regular Databricks Runtime and Databricks Runtime for Machine Learning.

## Create and manipulate Koalas DataFrame

Now you can turn a pandas DataFrame into a Koalas DataFrame that is API-compliant with the former.

In [2]:
import databricks.koalas as ks
import pandas as pd

pdf = pd.DataFrame({'x':range(3), 'y':['a','b','b'], 'z':['a','b','b']})

# Create a Koalas DataFrame from pandas DataFrame
df = ks.from_pandas(pdf)

# Rename the columns
df.columns = ['x', 'y', 'z1']

# Do some operations in place:
df['x2'] = df.x * df.x

# Print out the Koalas DataFrame
df

Unnamed: 0,x,y,z1,x2
0,0,a,a,0
1,1,b,b,1
2,2,b,b,4


To learn more, check out [10 minutes to Koalas](https://koalas.readthedocs.io/en/latest/getting_started/10min.html).