 Update: There has been new releases in Pandas and the current is 1.0.3 (March 17, 2020).See [Release Notes](https://pandas.pydata.org/docs/whatsnew/index.html#release) for a full changelog including other versions of pandas.



# What’s new in Pandas 1.0.0


![](https://pbs.twimg.com/profile_banners/1015706109255053317/1572020335/1500x500)

[pandas](https://dev.pandas.io/docs/index.html) is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.
The first release candidate for pandas 1.0.0 has arrived and here is a brief summary of some of the interesting enhancements that will come along with the release. 

For a detailed verison please refer to the [Pandas 1.0 Release notes](https://dev.pandas.io/docs/whatsnew/v1.0.0.html)

<div class="alert alert-block alert-info">
<h3>Note:</h3> Pandas 1.0 has dropped the support for Python 2 and requires atleast Python 3.6
</div>

In [None]:
# Upgrading pandas to 1.0

!pip install --upgrade pandas==1.0.0rc0

In [None]:
!pip show pandas

# Enhancements

## [1. Extended verbose info output for DataFrame](https://dev.pandas.io/docs/reference/api/pandas.DataFrame.info.html#pandas.DataFrame.info)

Any data analysis typically begings by exploring a dataframe. `pd.DataFrame.info` Print a concise summary of a DataFrame by printing information about the index dtype and column dtypes, non-null values and memory usage. 

In 1.0, DataFrame.info() also shows line numbers for the columns summary along with other details.

In [None]:
import pandas as pd
df = pd.DataFrame({"int_col": [1, 2, 3],
                    "text_col": ["a", "b", "c"],
                    "float_col": [0.0, 0.1, 0.2]})

df
   

### *pandas 1.0.0*

In [None]:
df.info()

### *pandas 0.25.x*

![](https://imgur.com/kTd0AJk.png)

## [2. Converting to Markdown](https://dev.pandas.io/docs/reference/api/pandas.DataFrame.to_markdown.html#pandas.DataFrame.to_markdown)

Dataframes can now be exported to markdown tables with `to_markdown()` method. This is pretty neat.

In [None]:
df = pd.DataFrame(data={"Name": ["Annie", "Cassie", 'Tom'], "Country": ["Japan", "Paris", "Canada"]})
df

### *pandas 1.0.0*

In [None]:
print(df.to_markdown())

## [3. Dedicated datatypes for strings](https://dev.pandas.io/docs/reference/api/pandas.StringDtype.html#pandas.StringDtype)

The latest version of pandas introduced experimental data types for strings. Earlier,there was only one otion to store text data and that was using the `object` stype. This sometimes created problems since one could accidently store a mixture of strings and non-strings in an object dtype array.

Now, with version 1.0, we have a `StringDtype`, we have a dedicated dtype for string data.

However, this implementation comes with the following warning which needs to be kept in mind.

<div class="alert alert-block alert-warning">
<h3 class="admonition-title">Warning</h3>
 StringDtype is currently considered experimental. The implementation and parts of the API may change without warning.
</div>



In [None]:
# dataset consisting of disaster tweets
disaster_tweets = pd.read_csv("../input/nlp-getting-started/train.csv")
disaster_tweets.head()

### *pandas 1.0.0*

In [None]:
disaster_tweets['text'] = disaster_tweets['text'].astype('string')
disaster_tweets.dtypes

It is recommended  to explicitly use the string data type when working with strings. There are a lot of advantages in doing the same:

#### 1. Easily select the string columns using the `select_dtypes()`. Previously a column could only be selected by using its name explicitly.

In [None]:
disaster_tweets.select_dtypes('string')[:4]

#### 2. The usual string accessor methods now work can be used for data manipulation

In [None]:
disaster_tweets.text.str.upper()[:5]

In [None]:
disaster_tweets.text.str.lower()[:5]

In [None]:
disaster_tweets.text.str.split()[:5]

### *pandas 0.25.x*

The previous verison of pandas throws error when using string datatype

```
disaster_tweets['text'] = disaster_tweets['text'].astype('string')
disaster_tweets.dtypes
```

![](https://imgur.com/1tra5c2.png)

## [4. Dedicated datatypes for booleans with missing value support](https://dev.pandas.io/docs/whatsnew/v1.0.0.html#boolean-data-type-with-missing-values-support)

pandas 1.0 has added an extension type dedicated to boolean data that can also hold missing values. The target column of the above dataset contains values either `1` or `0`.Let's convert it into a boolean datatype.

In [None]:
disaster_tweets['target'] = disaster_tweets['target'].astype('boolean')
disaster_tweets.info()

The `Dtype` column now reflects two new datatypes i.e `string` and `boolean`.

### Support for missing values
Also, another important thing to note is that, the **default bool** data type based on a bool-dtype NumPy array, the column can only hold True or False, and not missing values. This new **BooleanArray** can store missing values as well by keeping track of this in a separate mask.

In [None]:
# pandas 1.0
pd.Series([True, False, None], dtype='boolean')

```
# pandas 0.25.x
pd.Series([True, False, None], dtype='bool')
```
![](https://imgur.com/wdwBeCA.png)

## [5. Experimental NA scalar to denote missing values](https://dev.pandas.io/docs/whatsnew/v1.0.0.html#experimental-na-scalar-to-denote-missing-values)

pandas uses several values to represent missing data, for instance :

* `np.nan` for float data
* `np.nan` or `None` for object-dtype data
* `pd.NaT` for datetime-like data

pandas 1.0 does away with all of this and uses a new`pd.NA`(singleton) to represent scalar missing values. This has been done to introduce consistency across all datatypes (instead of `np.nan`, `None` or `pd.NaT` depending on the data type). However, this too comes with a warning:

<div class="alert alert-block alert-warning">
<h3 class="admonition-title">Warning</h3>
 Experimental: the behaviour of pd.NA can still change without warning.
</div>


In [None]:
s = pd.Series([1, 2, None], dtype="Int64")
s

In [None]:
s[2]

These are some of the changes in pandas 1.0.0 which are really useful. However it is advised to go through the full changelog to see other [important enhancements](https://dev.pandas.io/docs/whatsnew/v1.0.0.html#other-enhancements) to the pandas library.