ValueError: cannot reindex from a duplicate axis  
https://stackoverflow.com/questions/27236275/what-does-valueerror-cannot-reindex-from-a-duplicate-axis-mean  

公式参考は https://pandas.pydata.org/docs/user_guide/indexing.html?highlight=valueerror  
このバグか? https://github.com/pandas-dev/pandas/issues/30667 


In [1]:
import pandas as pd
import numpy as np
pd.options.display.notebook_repr_html = False  # jupyter notebook上での出力形式を制御するために書いています。無くても動きます。

In [2]:
# 動作環境の確認
print(pd.__version__)
print(np.__version__)

1.0.1
1.18.1


## reindexで起きる

In [3]:
s = pd.Series(np.arange(4), index=['a', 'a', 'b', 'c'])
s

a    0
a    1
b    2
c    3
dtype: int64

In [4]:
labels = ['c', 'd']

In [5]:
s.reindex(labels)

ValueError: cannot reindex from a duplicate axis

## mergeで起きる  
https://qiita.com/waterada/items/c239a6d0424537cfcfb9

In [6]:
# https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html

df1 = pd.DataFrame({'lkey': ['foo', 'bar', 'baz', 'foo'],
                    'value': [1, 2, 3, 5]})
df2 = pd.DataFrame({'rkey': ['foo', 'bar', 'baz', 'foo'],
                    'value': [5, 6, 7, 8]})

In [7]:
df1.merge(df2)

  lkey  value rkey
0  foo      5  foo

In [8]:
df1

  lkey  value
0  foo      1
1  bar      2
2  baz      3
3  foo      5

In [9]:
df2

  rkey  value
0  foo      5
1  bar      6
2  baz      7
3  foo      8

In [10]:
df1.merge(df2, left_on='lkey', right_on='rkey')

  lkey  value_x rkey  value_y
0  foo        1  foo        5
1  foo        1  foo        8
2  foo        5  foo        5
3  foo        5  foo        8
4  bar        2  bar        6
5  baz        3  baz        7

## 新規行の割り当てで起きる

In [11]:
a = np.arange(35).reshape(5,7)

In [12]:
df = pd.DataFrame(a, ['x', 'y', 'u', 'z', 'w'], list(range(10, 15)) + ['p', 'p'])


In [13]:
df

   10  11  12  13  14   p   p
x   0   1   2   3   4   5   6
y   7   8   9  10  11  12  13
u  14  15  16  17  18  19  20
z  21  22  23  24  25  26  27
w  28  29  30  31  32  33  34

In [14]:
df.values.dtype

dtype('int64')

In [15]:
df.loc['sums'] = df.sum(axis=0)

In [16]:
df

      10  11  12  13  14   p    p
x      0   1   2   3   4   5    6
y      7   8   9  10  11  12   13
u     14  15  16  17  18  19   20
z     21  22  23  24  25  26   27
w     28  29  30  31  32  33   34
sums  70  75  80  85  90  95  100

In [17]:
df.loc[:, 'p']

       p    p
x      5    6
y     12   13
u     19   20
z     26   27
w     33   34
sums  95  100

In [18]:
import pandas 
import numpy as np

a = np.array([[1,2],[3,4]]) 

# DO NOT WORKS
b = np.array([[0.5,6],[7,8]])  
# b = np.array([[.5,6],[7,8]])  # The same problem

# This one works fine:
# b = np.array([[5,6],[7,8]]) 

dfA = pandas.DataFrame(a)
# This works fine EVEN using .5, because the columns name is different
# dfA = pandas.DataFrame(a, columns=['a','b'])
dfB = pandas.DataFrame(b)

df_new = pandas.concat([dfA, dfB], axis = 1)

print(df_new[df_new > 5])

ValueError: cannot reindex from a duplicate axis

In [19]:
dfA

   0  1
0  1  2
1  3  4

In [20]:
dfB

     0    1
0  0.5  6.0
1  7.0  8.0

In [21]:
df_new > 5

       0      1      0     1
0  False  False  False  True
1  False  False   True  True

## 列名が重複したDataFrame

https://stackoverflow.com/questions/30788061/valueerror-cannot-reindex-from-a-duplicate-axis-using-isin-with-pandas